Feed aggregator

Partitioning – When data movement is not performed as expected

Yann Neuhaus - Thu, 2016-10-13 07:16

This blog is about an interesting partitioning story and curious data movements during merge operation. I was at my one of my customer uses intensively partitioning for various reasons including archiving and manageability. A couple of days ago, we decided to test the new fresh developed script, which will carry out the automatic archiving stuff against the concerned database in quality environment.

Let’s describe a little bit the context.

We used a range-based data distribution model. Boundary are number-based and are incremented monolithically with identity values.

We defined a partition function with range right values strategy as follows:

AS RANGE RIGHT FOR VALUES(@boundary_archive, @boundary_current)


The first implementation of the partition scheme consisted in storing all partitioned data inside the same filegroup.


Well, the initial configuration was as follows. We applied page compression to the archive partition because it contained cold data which is not supposed to be updated very frequently.

blog 106 - 0 - partitioned table starting point

blog 106 - 1 - partitioned table starting point config

As you may expect, the next step will consist in merging ARCHIVE and CURRENT partitions by using the MERGE command as follows:

MERGE RANGE (@b_archive);


Basically, we achieve a merge operation according to Microsoft documentation:

The filegroup that originally held boundary_value is removed from the partition scheme unless it is used by a remaining partition, or is marked with the NEXT USED property.


blog 106 - 2 - partitioned table after merge

At this point we may expect SQL Server has moved data from the middle partition to the left partition and according to this other Microsoft pointer

When two partitions are merged, the resultant partition inherits the data compression attribute of the destination partition.

But the results come as a little surprise because I expected to see a page compression value from my archive partition.

blog 106 - 3 - partitioned table after merge config

At this point, my assumption was either compression value doesn’t inherit correctly as mentioned in the Microsoft documentation or data movement is not performed as expected. The latter may be checked very quickly by looking at the corresponding records inside the transaction log file.

	COUNT(*) as nb_ops
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = 164670)
GROUP BY [AllocUnitName], Operation

I put only the relevant sample here

blog 106 - 5 - log file after merge config

Referring to what I saw above, I noticed data movement was not performed as expected but rather as shown below:

blog 106 - 4 - partitioned table after merge config 2

So, this strange behavior seems to explain why compression state switched from PAGE to NONE in my case. Another strange thing is when we changed the partition scheme to include an archive filegroup, we returned to normality.



I doubled checked the Microsoft documentation to see if it exists one section to figure out the behavior I experienced in this case but without success. After further investigations I found out an interesting blog post from Sunil Agarwal (Microsoft) about partition merging and some performance improvements shipped with SQL Server 2008 R2. In short, I was in the specific context described in the blog post (same filegroup) and merging optimization came into action transparently because number of rows in the archive partition was lower than the one in the current partition at the moment of the MERGE operation.

Let me introduce another relevant information – the number of lines for each partition – to the following picture:

blog 106 - 6 - partitioned table optimization stuff

Just to be clear, this is an interesting and smart mechanism provided by Microsoft but it may be a little bit confusing when you’re not aware of this optimization. In my case, we finally decided with my customer to dedicate an archiving partition to store cold data that will be rarely accessed in this context and keep up the possibility to store cold data on cheaper storage when archiving partition will grow to a critical size. But in other cases, if storing data is the same filegroup is a still relevant scenario, keep in mind this optimization to not experience unexpected behaviors.

Happy partitioning!



Cet article Partitioning – When data movement is not performed as expected est apparu en premier sur Blog dbi services.

How can we use Oracle to deduplicate our data

Bar Solutions - Thu, 2016-10-13 05:13

Dear Patrick,

We have gone through a merger at our company where we are trying to merge the databases. The problem now is that we have duplicate records in our tables. We can of course go through all the records by hand and check if they exist twice. Another option is to build an application to do this. But using the Oracle Database there must be a better way to do this. Any ideas?

Ramesh Cunar

Dear Ramesh,

Going through all these records by hand seems like quite a time consuming process. I don’t know the number of records in your tables, but I would definitely not do this by hand. Writing an application to do this makes it possible to run the application multiple times, so you can update the rules making it better every time. Writing an application can be time consuming and running it can be as well.

I think your best shot would be to try and find a solution in either PL/SQL or SQL. Depending on the complexity and size of your data you could try any of the following solutions.

You can write a procedure that loops through your data and saves the records in an Associative array. If the record already exists in the array, or at least the key fields are equal, then it should not be added.

  CURSOR c_ot_emp IS
    SELECT e.empno
      FROM ot_emp e;
  r_ot_emp         ot_emp%ROWTYPE;
  l_unique_ot_emps t_ot_emp_aa;
  FUNCTION record_already_exists(record_in        IN ot_emp%ROWTYPE
                                ,collection_inout IN OUT t_ot_emp_aa) RETURN BOOLEAN IS
    l_indx        PLS_INTEGER := collection_inout.first;
    l_returnvalue BOOLEAN := FALSE;
    IF l_indx IS NOT NULL THEN
        l_returnvalue := l_returnvalue OR ((record_in.ename = collection_inout(l_indx).ename) AND
                         (record_in.job = collection_inout(l_indx).job));
        l_indx := collection_inout.next(l_indx);
        EXIT WHEN l_returnvalue OR(l_indx IS NULL);
      END LOOP;
    END IF;
    RETURN l_returnvalue;
  END record_already_exists;

  OPEN c_ot_emp;
    FETCH c_ot_emp
      INTO r_ot_emp;
    EXIT WHEN c_ot_emp%NOTFOUND;
    -- check if this record already exists
    IF NOT (record_already_exists(record_in => r_ot_emp, collection_inout => l_unique_ot_emps)) THEN
      l_unique_ot_emps(l_unique_ot_emps.count + 1) := r_ot_emp;
    END IF;
  FOR indx IN l_unique_ot_emps.first .. l_unique_ot_emps.last LOOP
    INSERT INTO ot_emp_deduplicated
END ot_deduplicate_aa;

You can speed up this process by using bulk processing for both the retrieval and the storing of the data. Watch out that you don’t blow up your memory by retrieving too much data at once. You can use the limit clause to prevent this.

Another way to go at this, is the use of the analytic functions in Oracle. The idea behind using analytics is to rank all the record in their own partition. This rank can be rather arbitrary so we can use ROW_NUMBER as a ranking mechanism. In the PARTITION BY clause we can add all the columns we need to check for duplicates on. And to define which record to keep we add the (obligatory) order by clause. Then, using this new column, we can state that we are just interested in the records where the ROW_NUMBER equals 1.

WITH emp_all_rows AS
 (SELECT e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno
       ,row_number() OVER (PARTITION BY e.ename, e.job ORDER BY e.empno ASC) rn
    FROM ot_emp e) 
  INTO ot_emp_deduplicated(empno, ename, job, mgr, hiredate, sal, comm, deptno)
(SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
   FROM emp_all_rows
  WHERE rn = 1)

You can of course use this same technique to delete the duplicate rows directly from the source table, but then you cannot use the query factoring (WITH clause) because that isn’t supported in SQL.

DELETE FROM ot_emp e
 WHERE e.rowid IN 
(SELECT ear.rowid
   FROM (SELECT e.rowid
               ,row_number() over(PARTITION BY e.ename ORDER BY e.empno ASC) rn
           FROM ot_emp e) ear --<-- EmpAllRows
  WHERE rn > 1)

It is probably faster to insert the de-duplicated rows into a new table, then drop the original table and rename the new table. Be aware of any triggers, constraints, indexes etc. that may exist and need to be recreated.

You can check out the full demo code at
Or you can send me an email and I will send you the demo script.

Hope this helps in your efforts to de-duplicate your data.

Happy Oracle’ing,
Patrick Barel

If you have any comments on this subject or you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.

This question has been published in OTech Magazine of Winter 2015.

MS16-104 Security Update for IE Breaks EBS JRE Download

Steven Chan - Thu, 2016-10-13 02:04

Microsoft IE LogoMS16-104: Security update for Internet Explorer: September 13, 2016 (KB3185319) breaks the JRE (oaj2se.exe) download function in Oracle E-Business Suite 12.1 and 12.2. This issue occurs with IE 11 on Windows 7, 8.1, and 10.

Expected behaviour: If an end-user does not have the required JRE plug-in release installed on their desktop client, a dialog box appears, asking whether they wish to download it from the EBS server. When the end-user confirms the download, the oaj2se.exe file downloads and runs.

Incorrect behaviour after the installation of KB3185319: When the end-user confirms the download via the dialog box, oaj2se.exe (the JRE) will not download.

How to fix this problem

Microsoft has published separate fixes for this issue for Windows 7, 8.1, and 10.  The appropriate patch must be installed on all affected end-user desktop clients:

Temporary workaround

If an end-user has installed KB3185319 but is unable to install the fix, end-users may work around the problem by entering the file's URL in Internet Explorer's address bar.  For example:



Related Articles

Categories: APPS Blogs

Documentum story – ADTS not working anymore?

Yann Neuhaus - Thu, 2016-10-13 02:00

A few weeks ago, on one of our Documentum environments, we find out thanks to our monitoring that the Renditions weren’t generated anymore by our CTS/ADTS Server… This happened in a Sandbox environment where a lot of dev/testing was done in parallel between EMC, the different Application Teams and the Platform/Architecture Team (us). A lot of changes at the same time means that it might not be easy to find out what caused this issue…


After a few checks on our monitoring scripts just to ensure that the issue wasn’t the monitoring itself, it appears that this part was working properly and indeed the renditions weren’t generated anymore. Therefore we checked the configuration of the docbase/rendition server but didn’t find anything suspicious on the configuration side and therefore we checked the logs of the Rendition Server. The CTS/ADTS Server often print a lot of different errors that are all linked but which appear to have a different root cause. Therefore to know which error is really relevant, I cleaned up the log file (stop CTS/ADTS, backup log file and remove it) and then I launched our monitoring script that basically remove all existing renditions for a test document, if any, and then request a new set of renditions to be generated by the CTS/ADTS Server.


After doing that, it was clear that the following error was the real one I needed to take a look at:

11:14:58,562  INFO [ Thread-61] CTSThreadPoolManagerImpl -       Added ICTSTask to the ICTSThreadPoolManager: dm_transcode_content
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       Start. About to get Next ICTSTask from pool manager...
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       ICTSThreadPoolManager: removing first item from the list for processing...
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       Removing a task to execute it. Number in waiting list: 1
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       Next CTSTask received...
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       CTSThreadPoolManager has threadlimit -1
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       Processing next CTSTask...
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       About to get notifier from CTSTask...
11:14:58,562  INFO [Thread-153] CTSThreadPoolManagerImpl -       About to get session from CTSTask...
11:14:59,203  INFO [Thread-153] CTSThreadPoolManagerImpl -       About to get RUN CTSTask...
11:14:59,859  WARN [Thread-153] CTSOperationsUtils -       [BOCS/ACS] exportContentFiles error - javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: java.security.cert.CertPathBuilderException: Could not build a validated path.


11:14:59,859 ERROR [Thread-153] CTSThreadPoolManagerImpl -       Exception in CTSThreadPoolManagerImpl, notification :
com.documentum.cts.exceptions.internal.CTSServerTaskException: Error while exporting file(s) from repo01, objectName: check_ADTS_rendition_creation errorMessage: Executing Export operation failed
                Could not get the content file for check_ADTS_rendition_creation (093f245a800a303f, msw12, null)
Cause Exception was:
com.documentum.cts.exceptions.CTSException: Error while exporting file(s) from repo01, objectName: check_ADTS_rendition_creation errorMessage: Executing Export operation failed
                Could not get the content file for check_ADTS_rendition_creation (093f245a800a303f, msw12, null)
Cause Exception was:
DfException:: THREAD: Thread-153; MSG: Error while exporting file(s) from repo01, objectName: check_ADTS_rendition_creation errorMessage: Executing Export operation failed
                Could not get the content file for check_ADTS_rendition_creation (090f446e800a303f, msw12, null); ERRORCODE: ff; NEXT: null
                at com.documentum.cts.util.CTSOperationsUtils.getExportedFileEx5(CTSOperationsUtils.java:626)
                at com.documentum.cts.util.CTSOperationsUtils.getExportedFileEx4(CTSOperationsUtils.java:332)
                at com.documentum.cts.util.CTSOperationsUtils.getExportedFileEx3(CTSOperationsUtils.java:276)
                at com.documentum.cts.util.CTSOperationsUtils.getExportedFileEx2(CTSOperationsUtils.java:256)
                at com.documentum.cts.impl.services.task.CTSTask.exportInputContent(CTSTask.java:4716)
                at com.documentum.cts.impl.services.task.CTSTask.retrieveInputURL(CTSTask.java:4594)
                at com.documentum.cts.impl.services.task.CTSTask.initializeFromCommand(CTSTask.java:2523)
                at com.documentum.cts.impl.services.task.CTSTask.execute(CTSTask.java:922)
                at com.documentum.cts.impl.services.task.CTSTaskBase.doExecute(CTSTaskBase.java:514)
                at com.documentum.cts.impl.services.task.CTSTaskBase.run(CTSTaskBase.java:460)
                at com.documentum.cts.impl.services.thread.CTSTaskRunnable.run(CTSTaskRunnable.java:207)
                at java.lang.Thread.run(Thread.java:745)
11:14:59,859  INFO [Thread-153] CTSQueueProcessor -       _failureNotificationAdmin : false
11:14:59,859  INFO [Thread-153] CTSQueueProcessor -       _failureNotification : true


Ok so now it is clear that the error is actually the following one: “java.security.cert.CertPathBuilderException: Could not build a validated path”. This always means that a specific SSL Certificate Chain isn’t trusted. As you can see above, it is mentioned “BOCS/ACS” on the same line and actually the line just below contains the URL of the ACS… Therefore I thought about that and yes indeed one of the changes planned for that day was that the D2-BOCS has been installed and enabled on this environment. So what is the link between the ACS URL and the D2-BOCS installation? Well actually when installing the D2-BOCS, if you want to keep your environment secured, then you need the ACS to be switched to HTTPS because the D2-BOCS will force D2 to use the ACS URLs to download the documents to the client’s workstation when it is actually not using the ACS at all when there is no D2-BOCS installed. Therefore the installation of the D2-BOCS isn’t linked to the CTS/ADTS at all but one of our pre-requisites to install it was to setup the ACS in HTTPS and that is linked to the CTS/ADTS Server because it is actually using it to download the documents as you can see in the error above.


Now we know what was the error and just to confirm that, I switched back the ACS URL to HTTP (using DA: Administration > Distributed Content Configuration > ACS Servers > Right-click on ACS objects > Properties > ACS Server Connections) and re-init the Content Server (using DA: Administration > Basic Configuration > Content Servers > Right-click on CS objects > Properties > Check “Re-Initialize Server” and click OK).


Right after doing that, the monitoring switched back to green, meaning that renditions were created again and therefore this was indeed the one and only issue. So what to do if we want to use the ACS in HTTPS in correlation with the Rendition? Well we just have to explain to the CTS/ADTS Server that he can trust the ACS SSL Certificate and this is done by updating the cacerts file of Java used by the Rendition Server. This is done pretty easily using the following commands for which I will suppose that the Rendition Server has been installed on a D: Drive under “D:\CTS”.


So the first thing to do is to upload your Certificate Chain to the Rendition Server and put them under “D:\certs” (I will suppose there are two SSL Certificates in the chain: a Root and a Gold). Then simply start a command prompt as Administrator and execute the following commands to update the cacerts file of Java:

D:\> copy D:\CTS\java64\1.7.0_72\jre\lib\security\cacerts D:\CTS\java64\1.7.0_72\jre\lib\security\cacerts.bck_YYYYMMDD
        1 file(s) copied.

D:\> D:\CTS\java64\1.7.0_72\bin\keytool.exe -import -trustcacerts -alias root_ca -keystore D:\CTS\java64\1.7.0_72\jre\lib\security\cacerts -file D:\certs\Internal_Root_CA.cer
Enter keystore password:
Trust this certificate? [no]:  yes
Certificate was added to keystore
D:\> D:\CTS\java64\1.7.0_72\bin\keytool.exe -import -trustcacerts -alias gold_ca -keystore D:\CTS\java64\1.7.0_72\jre\lib\security\cacerts -file D:\certs\Internal_Gold_CA1.cer
Enter keystore password:
Certificate was added to keystore


Now just switch the ACS to use HTTPS again, restart the Rendition services using the Windows Services or command line tool and the next time you will request a rendition, it will work without errors even in HTTPS. That’s actually a very common mistake to setup the SSL on the Content Server side which is great but you need not to forget that some other components might use what you switch to HTTPS and therefore these additional components need to trust your SSL Certificates too!


Note 1: in our case, the WebLogic Server hosting D2 was already in HTTPS and therefore it was already trusting the Internal Root & Gold SSL Certificates, reason why we could use the ACS in HTTPS from D2 without issue.

Note 2: in case you didn’t know about it, I think it is now clear that the CTS/ADTS Server is using the ACS to download the files… Therefore if you want a secured environment even without D2-BOCS, you absolutely need to switch your ACS to HTTPS!


Cet article Documentum story – ADTS not working anymore? est apparu en premier sur Blog dbi services.

how to improve the SQL performance of an 8 node Exadata Data Warehouse RAC

Tom Kyte - Thu, 2016-10-13 00:46
Hi Tom! Last day I work with an 8 node Exadata Data Warehouse to tune performance for the slow SQL,and I found that the table is not gather statistics and the execution plan is not the best, it executes with the TABLE ACCESS STORAGE FULL,the table...
Categories: DBA Blogs

change running code's execution plan

Tom Kyte - Thu, 2016-10-13 00:46
Hi Tom, Suppose a package is executing in prod environment, currently proc A is running and after that Proc B will be executed in next 45 mins. I have just spotted some problems in proc B execution plan(which was generated earlier). I cant chang...
Categories: DBA Blogs


Tom Kyte - Thu, 2016-10-13 00:46
Hi Guys, I am using linux to run the query. I have one query like select * from Emp; I want to spool the results into one file(this file has only exported data without number of rows selected message) and what ever the log i.e number of lines ...
Categories: DBA Blogs

How to replicate a nested table

Tom Kyte - Thu, 2016-10-13 00:46
I would like to know if there is any easier way to replicate a nested table. Say for a normal table, we can just do CREATE TABLE a_bkp as SELECT * FROM a; Is there a similar easy way to create replica of existing nested table ?
Categories: DBA Blogs


Tom Kyte - Thu, 2016-10-13 00:46
Can someone please help me with query for how can I extract the 'comment' section of all the tables ? Used this query to get all the table names: Select table_name, owner, tablespace_name from dba_tables; Used below query to gather the comm...
Categories: DBA Blogs


Tom Kyte - Thu, 2016-10-13 00:46
Hi All, It seems that only indexes that are listed in the explain plan are monitored by index monitoring feature. But indexes that are used for constraint checking might not be listed in explain plan and in this case the index monitoring featur...
Categories: DBA Blogs

Customers talk to each other—and everyone wins

Linda Fishman Hoyle - Wed, 2016-10-12 20:40

A Guest Post by Oracle Senior Vice President Chris Leone, Applications Development (pictured left)

Most customer panels are conversations between the moderator and the customer. Not in Cara Capretta’s HCM session at OpenWorld 2016. Two minutes into the conversation, the HR and IT leaders from Kaiser Permanente, SGS, Siemens Corporation, MoneyGram, and IPSOS were laughing and joking with each other.

“We’re in this Together”

It turns out that the panelists knew each other from reference calls.Caroline Villemin, HCM Systems Manager of IPSOS, said her reference call with SGS was “more than just getting feedback on the solution; we really discovered the customer community and everything around it.”

Etienne Delobel, HRIS Director of SGS, described the interaction with a professional colleague this way: “When Caroline and the CFO of IPSOS came for an on-site visit, they spent half a day with us looking into our HCM platform. I told them that if they go with HCM Cloud, we'll work together, and we'll be together." He added, “There are all these great features available in the cloud. The thing is, I do not have enough resources on my team to play with them, to have fun with them. So now I'm looking at how IPSOS is implementing and using things like self-service. They’re prototyping for us.”

Customers Connect Online

In the on-premises world, customers interacted mostly with vendors and partners. Now that everyone in the SaaS world is on the same release, customers are looking more to each other for advice. You see this in online communities such as Oracle’s Customer Connect, where customers join discussion forums on topics covering CX, HCM, ERP, EPM, and more.

The HCM Cloud community alone has 20,000 active community members. In this Profit Magazine interview with Oracle SVP Chris Leone (pictured left), he points out that activity in the cloud community has grown because people are sharing information across organizations. They’re saying, “Hey, take advantage of this new succession planning feature, or this new career development tool. They’re sharing reports.”

Oracle Listens

In addition, customers use the Idea Lab to submit ideas, collaborate on development, and vote for their favorite ideas. According to Leone, 80 percent of the enhancements in Release 12 were customer-driven. HCM customers have submitted 7,200 ideas so far.

Are you part of this online community? We encourage you to sign up today.

How to install Mirantis OpenStack 9.0 using VirtualBox – part 3

Yann Neuhaus - Wed, 2016-10-12 14:20

In the two first blogs, I installed the Fuel environment and deployed OpenStack in the Fuel slave nodes and all of that from the Fuel Master node.

In this blog, I will show you  all the steps to follow in order to create an instance in OpenStack. All is going to be done via the Command Line Interface and not via Horizon – the OpenStack Dashboard (I will explain this in an other blog) .


Let’s begin!

First of all let’s connect to the Fuel Master node and list all the nodes :

[root@fuel ~]# fuel2 node list
| id | name | status | os_platform | roles | ip | mac | cluster | platform_name | online |
| 1 | Storage | ready | ubuntu | [u'cinder'] | | 08:00:27:80:04:e8 | 1 | None | True |
| 2 | Compute | ready | ubuntu | [u'compute'] | | 08:00:27:cc:85:69 | 1 | None | True |
| 3 | Controller | ready | ubuntu | [u'controller'] | | 08:00:27:35:b0:77 | 1 | None | True |


Now, I connect to the controller node

[root@fuel ~]# ssh
Warning: Permanently added '' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-98-generic x86_64)

* Documentation:  https://help.ubuntu.com/
Last login: Tue Oct 11 09:55:54 2016 from


Let’s put an alias for each of the Fuel slave nodes  :

[root@fuel ~]# cat /etc/hosts   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6       fuel.domain.tld fuel       fuel.domain.tld controller       fuel.domain.tld compute       fuel.domain.tld storage


Now I can connect with the aliases:

[root@fuel ~]# ssh controller
The authenticity of host 'controller (' can't be established.
ECDSA key fingerprint is 01:b5:15:22:03:d0:f9:bb:86:3a:06:a7:8c:19:bd:22.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'controller,' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-98-generic x86_64)

* Documentation: https://help.ubuntu.com/
Last login: Tue Oct 11 10:12:34 2016 from
root@node-3:~# exit


Repeat this step for the two left nodes (compute & storage)

I connect to the controller node :

[root@fuel ~]# ssh controller
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-98-generic x86_64)

* Documentation:  https://help.ubuntu.com/
Last login: Tue Oct 11 11:24:16 2016 from


I  try to enter an OpenStack Command Line (list the instances for example) :

root@node-3:~# nova list
 ERROR (CommandError): You must provide a username or user ID via --os-username, --os-user-id, env[OS_USERNAME] or env[OS_USER_ID]

It is normal because using OpenStack Clients Command Line  requires before to get a token from Keystone.

In order to do that, you have to specify where the controller node can reach Keystone and get the required informations from the  OpenStack API’s. Because it is an authentification process you will also need to specify a username and a password.

But doing this at each time you use the OpenStack Command Lines clients can  quickly become inconvenient. Hopefully, in OpenStack, there is a way to avoid this. The solution is to create a file with all environments variables that need to be exported for getting this token from Keystone.

Mirantis creates this file for us:

root@node-3:~# ls

Let’s see what this file contains :

root@node-3:~# cat openrc
export OS_NO_CACHE='true'
export OS_TENANT_NAME='admin'
export OS_PROJECT_NAME='admin'
export OS_USERNAME='admin'
export OS_PASSWORD='admin'
export OS_AUTH_URL=''
export OS_DEFAULT_DOMAIN='Default'
export OS_AUTH_STRATEGY='keystone'
export OS_REGION_NAME='RegionOne'
export NOVA_ENDPOINT_TYPE='internalURL'
export OS_ENDPOINT_TYPE='internalURL'
export MURANO_REPO_URL='http://storage.apps.openstack.org/'


I source the file to export the environment variables :

root@node-3:~# source openrc


I check if the variables were exported. It is via the OS_AUTH_URL that the controller node can reach Keystone

root@node-3:~# echo $OS_USERNAME
root@node-3:~# echo $OS_PASSWORD
root@node-3:~# echo $OS_AUTH_URL


Now I can list if there are some instances running :

root@node-3:~# nova list
| ID | Name | Status | Task State | Power State | Networks |



Let’s create the first instance

In order to create this instance, I need :

  • a flavor
  • an OS
  • a network
  • a keypair
  • a name


I list the available flavors

root@node-3:~# nova flavor-list
 | ID                                   | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
 | 1                                    | m1.tiny   | 512       | 1    | 0         |      | 1     | 1.0         | True      |
 | 2                                    | m1.small  | 2048      | 20   | 0         |      | 1     | 1.0         | True      |
 | 3                                    | m1.medium | 4096      | 40   | 0         |      | 2     | 1.0         | True      |
 | 4                                    | m1.large  | 8192      | 80   | 0         |      | 4     | 1.0         | True      |
 | 5                                    | m1.xlarge | 16384     | 160  | 0         |      | 8     | 1.0         | True      |
 | d942587a-c48b-48ca-9c96-cad3c358eb6e | m1.micro  | 64        | 0    | 0         |      | 1     | 1.0         | True      |


Then, I list the Operating Systems or images available. There is only one image created by default by Fuel which is a mini Linux, it is called Cirros.

root@node-3:~# nova image-list
| ID                                   | Name   | Status | Server |
| a3708fe7-60f7-49c9-91ed-a6eee1ab8ba4 | TestVM | ACTIVE |        |


I list also the networks (they had been created by Fuel during the deployment of OpenStack)

root@node-3:~# neutron net-list
| id                                   | name               | subnets                                               |
| b22e82c9-df6b-4580-a77e-cde8e93f30d8 | admin_floating_net | 7acc6b15-1c00-4447-b4f7-0fcced7a594b    |
| 09b1e122-cb63-44d5-af0b-244d3aa06331 | admin_internal_net | aea6fc29-dfb6-4586-9b09-70ce5c992315 |


I create a keypair that will be injected in the future instance in order to avoid a password authentification (unless if a password authentification was set up).. This step is not useful for this case because the Cirros image provides a password by default. But it can be helpful if you use other cloud images (ubuntu, centos, etc..)

root@node-3:~# nova keypair-add --pub-key ~/.ssh/authorized_keys mykey
root@node-3:~# nova keypair-list
| Name  | Type | Fingerprint                                     |
| mykey | ssh  | ee:56:e6:c0:7b:e2:d5:2b:61:23:d7:76:49:b3:d8:d5 |


Now I got all the informations that I need, let’s create the instance which I will name InstanceTest01

root@node-3:~# nova boot \
> --flavor m1.micro \
> --image TestVM \
> --key-name mykey \
> --nic net-id=09b1e122-cb63-44d5-af0b-244d3aa06331 \
> InstanceTest01
| Property | Value |
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | instancetest01 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000001 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-dyo0086w |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | bUQJDtwM3vjr |
| config_drive | |
| created | 2016-10-11T13:08:34Z |
| description | - |
| flavor | m1.micro (d942587a-c48b-48ca-9c96-cad3c358eb6e) |
| hostId | |
| host_status | |
| id | b84b49f1-1b01-4aa9-bd9c-c8691fca9298 |
| image | TestVM (a3708fe7-60f7-49c9-91ed-a6eee1ab8ba4) |
| key_name | mykey |
| locked | False |
| metadata | {} |
| name | InstanceTest01 |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | abfec6fc54c14da28f9971e04c344ec8 |
| updated | 2016-10-11T13:08:36Z |
| user_id | d0e1e11f84064f4c8aa02381f0d42ed2 |

Here is the instance running :

root@node-3:~# nova list
| ID | Name | Status | Task State | Power State | Networks |
| b84b49f1-1b01-4aa9-bd9c-c8691fca9298 | InstanceTest01 | ACTIVE | - | Running | admin_internal_net= |

So the instance is up and running


We connect to the compute node to check if the instance is really running on it:

Connection to controller closed.
[root@fuel ~]# ssh compute
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-98-generic x86_64)

* Documentation: https://help.ubuntu.com/
Last login: Tue Oct 11 18:07:23 2016 from
root@node-2:~# virsh list
Id Name State
2 instance-00000001 running


Yes, the instance is running.

I need to add the ssh rule for accessing the instance, I will add also the icmp one. These rules are managed by security groups.

Let’s try to ping the instance just created, from now all the operations are done on the controller node  :

 root@node-3:~# ip netns exec qrouter-0c67b78a-93d4-417c-9ad3-3b29e1480934 ping
 PING ( 56(84) bytes of data.
 --- ping statistics ---
 3 packets transmitted, 0 received, 100% packet loss, time 2004ms


I can not ping my instance, let’s see what rules are currently in the default security group created by OpenStack:

root@node-3:~# nova secgroup-list
 | Id                                   | Name    | Description            |
 | 954e4d85-ea38-4a4b-bbe6-e355946fdfb0 | default | Default security group |
root@node-3:~# nova secgroup-list-rules 954e4d85-ea38-4a4b-bbe6-e355946fdfb0
| IP Protocol | From Port | To Port | IP Range | Source Group |
|                                             | | | | default |
|                                             | | | | default |

There are no rules. So, I add the icmp rule in this default security group

root@node-3:~# nova secgroup-add-rule 954e4d85-ea38-4a4b-bbe6-e355946fdfb0 icmp -1 -1
 | IP Protocol | From Port | To Port | IP Range | Source Group |
 | icmp | -1 | -1 | | |


I check if the rule was added

root@node-3:~# nova secgroup-list-rules 954e4d85-ea38-4a4b-bbe6-e355946fdfb0
 | IP Protocol | From Port | To Port | IP Range  | Source Group |
 |             |           |         |           | default      |
 |             |           |         |           | default      |
 | icmp        | -1        | -1      | |              |


Let’s test if I can ping the instance, we need to use the network namespace qrouter which is basically the router that the controller node will use to reach the instance on the compute node

root@node-3:~# ip netns list




root@node-3:~# ip netns exec qrouter-0c67b78a-93d4-417c-9ad3-3b29e1480934 ping
 PING ( 56(84) bytes of data.
 64 bytes from icmp_seq=1 ttl=64 time=2.29 ms
 64 bytes from icmp_seq=2 ttl=64 time=0.789 ms
 --- ping statistics ---
 2 packets transmitted, 2 received, 0% packet loss, time 1001ms


I do the same with the ssh rule

First, I test if I can access my instance

root@node-3:~# ip netns exec qrouter-0c67b78a-93d4-417c-9ad3-3b29e1480934 ssh cirros@ -v
 OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
 debug1: Reading configuration data /etc/ssh/ssh_config
 debug1: /etc/ssh/ssh_config line 19: Applying options for *
 debug1: Connecting to [] port 22.


No I can not, so I add the rule

root@node-3:~# nova secgroup-add-rule  954e4d85-ea38-4a4b-bbe6-e355946fdfb0 tcp 22 22
| IP Protocol | From Port | To Port | IP Range  | Source Group |
| tcp         | 22        | 22      | |              |


I check if the rule was added :

root@node-3:~# nova secgroup-list-rules 954e4d85-ea38-4a4b-bbe6-e355946fdfb0
| IP Protocol | From Port | To Port | IP Range | Source Group |
| | | | | default |
| tcp | 22 | 22 | | |
| | | | | default |
| icmp | -1 | -1 | | |

I connect to my instance, the username by default is cirros

root@node-3:~# ip netns exec qrouter-0c67b78a-93d4-417c-9ad3-3b29e1480934 ssh cirros@ -v
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to [] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8
debug1: Remote protocol version 2.0, remote software version dropbear_2015.67
debug1: no match: dropbear_2015.67
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 1b:f1:1c:34:13:83:cd:b1:37:a9:e4:32:37:65:91:c4
The authenticity of host ' (' can't be established.
ECDSA key fingerprint is 1b:f1:1c:34:13:83:cd:b1:37:a9:e4:32:37:65:91:c4.
Are you sure you want to continue connecting (yes/no)? yes



Type yes, and the password is cubswin:)

Warning: Permanently added '' (ECDSA) to the list of known hosts.
debug1: ssh_ecdsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Trying private key: /root/.ssh/id_rsa
debug1: Trying private key: /root/.ssh/id_dsa
debug1: Trying private key: /root/.ssh/id_ecdsa
debug1: Trying private key: /root/.ssh/id_ed25519
debug1: Next authentication method: password
cirros@'s password: 
debug1: Authentication succeeded (password).
Authenticated to ([]:22).
debug1: channel 0: new [client-session]
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
$ # "I am connected to my instance"

So now, I am connected to my instance


Let’s check if I can ping google

$ ping www.google.com
PING www.google.com ( 56 data bytes
64 bytes from seq=0 ttl=59 time=29.485 ms
64 bytes from seq=1 ttl=59 time=9.089 ms
64 bytes from seq=2 ttl=59 time=27.027 ms
64 bytes from seq=3 ttl=59 time=9.992 ms
64 bytes from seq=4 ttl=59 time=26.734 ms

This ended this series. Mirantis made OpenStack very simple to install but it requires good skills in Puppet and Astute if you want to customize the installation (like create a network node role or customize the installation of MySQL database in the controller node for example)

In other blogs, I will show you how to add nodes to the Fuel environment to make the cloud more powerful.


Cet article How to install Mirantis OpenStack 9.0 using VirtualBox – part 3 est apparu en premier sur Blog dbi services.

Court Orders Rimini Street to Stop Unlawful Conduct and Awards Oracle $27.7 Million in Prejudgment Interest

Oracle Press Releases - Wed, 2016-10-12 14:00
Press Release
Court Orders Rimini Street to Stop Unlawful Conduct and Awards Oracle $27.7 Million in Prejudgment Interest

Redwood Shores, Calif.—Oct 12, 2016

Yesterday, the United States District Court for the District of Nevada issued two significant rulings in Oracle’s litigation against Rimini Street and its CEO Seth Ravin.

The first ruling is a permanent injunction barring Rimini Street from continuing to infringe Oracle’s copyrights and other unlawful acts. The Court previously determined that a permanent injunction was warranted, noting Rimini Street’s “callous disregard for Oracle’s copyrights and computer systems when it engaged in the infringing conduct” and that “Rimini’s business model was built entirely on its infringement of Oracle’s copyrighted software and its improper access and downloading of data from Oracle’s website and computer systems.”

The Court’s four-page permanent injunction prohibits certain copying, distribution, and use of Oracle’s copyrighted software and documentation by Rimini Street and also imposes limitations on Rimini Street’s access to Oracle’s websites. The order states, for example, that Rimini Street may not use a customer’s software environment “to develop or test software updates or modifications for the benefit of any other licensee.” The order also prohibits Rimini Street’s preparation of certain derivative works from Oracle’s copyrighted software. The order applies not only to Rimini Street, but also “its subsidiaries, affiliates, employees, directors, officers, principals, and agents.”

“This permanent injunction imposes important restrictions on Rimini Street,” said Oracle’s General Counsel Dorian Daley, “and Oracle is grateful that the Court has taken steps to prevent continuing unlawful acts by Rimini Street and its executives.”

Although Rimini Street has stated that there was “no expected impact” from any injunction, Rimini Street also told the Court that it “could suffer significant harm to its current business practices if the proposed injunction were entered,” which it now has been. These contradictory positions raise the issue of whether Rimini is misleading the Court or making misrepresentations to customers.

In its second ruling, the Court awarded Oracle approximately $27.7 million in prejudgment interest, with an additional amount of prejudgment interest to be awarded based on the date of the final judgment. This is in addition to the $50 million jury verdict and the $46.2 million in attorneys’ fees and costs awarded to Oracle last month.

Contact Info
Deborah Hellinger
About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit www.oracle.com.


Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Deborah Hellinger

  • +1.212.508.7935

Presenting at KScope can change your life

Dietmar Aust - Wed, 2016-10-12 13:16
Back in 2008 I was invited by Scott Spendolini to speak about my experiences with Oracle APEX at the German telecom shops at the KScope conference in New Orleans.

It was a truly great experience. 

At that time the system I had built was one of the larger deployments of Oracle APEX known in the community serving 6.000 users in the 800 telecom shops in Germany with 400.000 hits per day on the system.

Not only could I showcase what I had built with Oracle APEX 2.2 but I could even get more connected with the Oracle APEX community. It was awesome.

I had the chance to meet many people from the Oracle forums in person, like Scott Spendolini, Patrick Wolf, Dimitri Gielis, Roel Hartman, Karen Cannell, Anton Nielsen, John Scott but also many people from the APEX team itself like Mike Hichwa, Joel Kallman, Carl Backstrom, David Peake, Scott Spadafore and many others.

On this conference you will certainly meet the most members of the APEX team itself. It is a great opportunity to get in touch with the people in person ... this just makes a HUGE difference. So whenever you might need support later on ... your name will be known and people understand that you contribute just as well. It makes a difference !

From that time on in 2008 I got hooked and got more and more involved in the community. Each project became an opportunity to research yet another interesting topic for the next conference.

And I presented each year (I missed Long Beach ... what a bummer :( ) until now ... and I will submit again this week ... because the deadline is coming close ( 14.10.2016 ) !!!

This conference sets itself apart not only by the extremely high class content but also by the social activities, there is soo much going on. They always select really nice locations, everybody is relaxed and has so much fun. The APEX presentations are typically well attended ... even more than from the other tracks.

I can only encourage you to submit your presentation since the deadline is coming up soon: 14.10.2016 !!!

In order to submit a really compelling presentation, please read up these posts for guidance:
And remember ... we need presenters on all levels: beginners, intermediates and experts.

Hope to see you in San Antonio at KScope 2017 !!!


Oracle Documents Cloud Service Named in the Constellation Research ShortList for Enterprise File Sharing Suites

WebCenter Team - Wed, 2016-10-12 10:17

The Constellation ShortList presents vendors in different categories of the market relevant to early adopters. In addition, products included meet the threshold criteria for this category as determined by Constellation Research.

This Constellation ShortList of vendors for a market category is compiled through conversations with early adopter clients, independent analysts, and briefings with vendors and partners.

About the Enterprise File Sharing Suites Constellation ShortList

Enterprise file sharing and synchronization solutions allow individuals and teams to share and synchronize documents, photos, videos and other files across multiple devices in a secure manner. This category refers to people collaborating on and sharing business content in a distributed approach, allowing for real-time editing and file distribution as needed. The most advanced vendors enable developers to create business, content-centric, line-of-business applications that incorporate files as part of that process.

“Source:  Constellation Research, Inc., " Constellation Research ShortList for Enterprise File Sharing Suites", Alan Lepofsky, October 12, 2016".

Oracle Documents Cloud Service was included in the Constellation Research ShortList for Enterprise File Sharing Suites.

In today’s world, your employees need instant access to the content that drives your business. Whether they are at their desk, on the go, or working remotely, the ability to easily and securely access and collaborate on business content is one of the keys to being a nimble enterprise. Oracle Documents Cloud Service, an enterprise content collaboration solution with rich social features, gives you the file sync and share capabilities your employees need while providing the control and security your IT organization requires. All of the content can be accessed and collaborated on in context of business workflows, business applications and office productivity tools.

Oracle Documents Cloud Service lets you easily store your business content in the cloud, securely access it from anywhere, and share it with your colleagues and partners in real time. With a fast, intuitive web interface and easy-to-use desktop and mobile applications, employees can view and collaborate on files even when offline, keeping your organization running efficiently and your employees staying productive from any location.

Oracle Documents Cloud Service was pleased to be included in the Constellation Research ShortList for Enterprise File Sharing Suites. You can read the full report here.

Streaming data from Oracle using Oracle GoldenGate and Kafka Connect

Rittman Mead Consulting - Wed, 2016-10-12 10:00

This article was also posted on the Confluent blog, head over there for more great Kafka-related content!

Kafka Connect is part of the Confluent Platform, providing a set of connectors and a standard interface with which to ingest data to Kafka, and store or process it the other end. Initially launched with a JDBC source and HDFS sink, the list of connectors has grown to include a dozen certified connectors, and twice as many again 'community' connectors. These cover technologies such as MongoDB, InfluxDB, Kudu, MySQL - and of course as with any streaming technology, twitter, the de-facto source for any streaming how-to. Two connectors of note that were recently released are for Oracle GoldenGate as a source, and Elasticsearch as a sink. In this article I'm going to walk through how to set these up, and demonstrate how the flexibility and power of the Kafka Connect platform can enable rapid changes and evolutions to the data pipeline.

The above diagram shows an overview of what we're building. Change Data Capture (CDC) on the database streams every single change made to the data over to Kafka, from where it is streamed into Elasticsearch. Once in Elasticsearch it can be viewed in tools search as Kibana, for search and analytics:

Oracle GoldenGate (OGG) is a realtime data replication tool, falling under the broad umbrella of Change Data Capture (CDC) software, albeit at the high end in terms of functionality. It supports multiple RDBMS platforms, including - obviously - Oracle, as well as DB2, MySQL, and SQL Server. You can find the full certification list here. It uses log-based technology to stream all changes to a database from source, to target - which may be another database of the same type, or a different one. It is commonly used for data integration, as well as replication of data for availability purposes.

In the context of Kafka, Oracle GoldenGate provides a way of streaming all changes made to a table, or set of tables, and making them available to other processes in our data pipeline. These processes could include microservices relying on an up-to-date feed of data from a particular table, as well as persisting a replica copy of the data from the source system into a common datastore for analysis alongside data from other systems.

Elasticsearch is an open-source distributed document store, used heavily for both search, and analytics. It comes with some great tools including Kibana for data discovery and analysis, as well as a Graph tool. Whilst Elasticsearch is capable of being a primary data store in its own right, it is also commonly used as a secondary store in order to take advantage of its rapid search and analytics capabilities. It is the latter use-case that we're interested in here - using Elasticsearch to store a copy of data produced in Oracle.

Confluent's Elasticsearch Connector is an open source connector plug-in for Kafka Connect that sends data from Kafka to Elasticsearch. It is highly efficient, utilising Elasticsearch's bulk API. It also supports all Elasticsearch's data types which it automatically infers, and evolves the Elasticsearch mappings from the schema stored in Kafka records.

Oracle GoldenGate can be used with Kafka to directly stream every single change made to your database. Everything that happens in the database gets recorded in the transaction log (OGG or not), and OGG takes that ands sends it to Kafka. In this blog we're using Oracle as the source database, but don't forget that Oracle GoldenGate supports many sources. To use Oracle GoldenGate with Kafka, we use the "Oracle GoldenGate for Big Data" version (which has different binaries). Oracle GoldenGate has a significant advantage over the JDBC Source Connector for Kafka Connect in that it is a 'push' rather than periodic 'pull' from the source, thus it :

  1. Has much lower latency
  2. Requires less resource on the source database, since OGG mines the transaction log instead of directly querying the database for changes made based on a timestamp or key.
  3. Scales better, since entire schemas or whole databases can be replicated with minimal configuration changes. The JDBC connector requires each table, or SQL statement, to be specified.

Note that Oracle Golden Gate for Big Data also has its own native Kafka Handler, which can produce data in various formats directly to Kafka (rather than integrating with the Kafka Connect framework).


I'm using the Oracle BigDataLite VM 4.5 as the base machine for this. It includes Oracle 12c, Oracle GoldenGate for Big Data, as well as a CDH installation which provides HDFS and Hive for us to also integrate with later on.

On to the VM you need to also install:

  • Confluent Plaform 3.0
  • Oracle GoldenGate Kafka Connect connector
  • Elasticsearch Kafka Connect connector
  • Elasticsearch 2.4

To generate the schema and continuous workload, I used Swingbench 2.5.

For a step-by-step guide on how to set up these additional components, see this gist.

Starting Confluent Platform

There are three processes that need starting up, and each retains control of the session, so you'll want to use screen/tmux here, or wrap the commands in nohup [.. command ..] & so that they don't die when you close the window.

On BigDataLite the Zookeeper service is already installed, and should have started at server boot:

[oracle@bigdatalite ~]$ sudo service zookeeper-server status
zookeeper-server is running

If it isn't running, then start it with sudo service zookeeper-server start.

Next start up Kafka:

# On BigDataLite I had to remove this folder for Kafka to start
sudo rm -r /var/lib/kafka/.oracle_jre_usage
sudo /usr/bin/kafka-server-start /etc/kafka/server.properties

and finally the Schema Registry:

sudo /usr/bin/schema-registry-start /etc/schema-registry/schema-registry.properties

Note that on BigDataLite the Oracle TNS Listener is using port 8081 - the default for the Schema Registry - so I amended /etc/schema-registry/schema-registry.properties to change



Configuring Oracle GoldenGate to send transactions to Kafka Connect

Oracle GoldenGate (OGG) works on the concept of an Extract process which reads the source-specific transaction log and writes an OGG trail file in a generic OGG format. From this a Replicat process reads the trail file and delivers the transactions to the target.

In this example we'll be running the Extract against Oracle database, specifically, the SOE schema that Swingbench generated for us - and which we'll be able to generate live transactions against using Swingbench later on.

The Replicat will be sending the transactions from the trail file over to Kafka Connect.

I'm assuming here that you've already successfully defined and set running an extract against the Swingbench schema (SOE), with a trail file being delivered to /u01/ogg-bd/dirdat. For a step-by-step guide on how to do this all from scratch, see here.

You can find information about the OGG-Kafka Connect adapter in the README here.

To use it, first configure the replicat and supporting files as shown.

  1. Replicat parameters

    Create /u01/ogg-bd/dirprm/rconf.prm with the following contents:

    REPLICAT rconf
    TARGETDB LIBFILE libggjava.so SET property=dirprm/conf.props
    MAP *.*.*, TARGET *.*.*;
  2. Handler configuration

    Edit the existing /u01/ogg-bd/dirprm/conf.props and amend gg.classpath as shown below. The classpath shown works for BigDataLite - on your own environment you need to make the necessary jar files available per the dependencies listed in the README.

    #The handler properties
    #The formatter properties
    #Set the classpath here
    javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar:./dirprm

    Note the gg.log.level setting - this can be very useful to switch to DEBUG if you're investigating problems with the handler.

  3. Kafka Connect settings

    Edit the existing /u01/ogg-bd/dirprm/confluent.properties and amend the schema.registry.url URL to reflect the port change made above. All other values can be left as defaults.


Now we can add the replicat. If not already, launch ggsci from the ogg-bd folder:

cd /u01/ogg-bd/
rlwrap ./ggsci

and define the replicat, and start it


Check its status:

GGSCI (bigdatalite.localdomain) 13> INFO RCONF

REPLICAT   RCONF     Last Started 2016-09-02 15:39   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:02 ago)
Process ID           25415
Log Read Checkpoint  File ./dirdat/rt000000000
                     First Record  RBA 0

Note that on BigDataLite 4.5 VM there are two existing replicats configured, RKAFKA and RMOV. You can ignore these, or delete them if you want to keep things simple and clear.

Testing the Replication

We'll run Swingbench in a moment to generate some proper throughput, but let's start with a single transaction to check things out.

Connect to Oracle and insert a row, not forgetting to commit the transaction (he says, from frustrating experience ;) )

[oracle@bigdatalite ogg]$ sqlplus soe/soe@orcl

SQL*Plus: Release Production on Fri Sep 2 15:48:18 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Fri Sep 02 2016 12:48:22 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> insert into soe.logon values (42,42,sysdate);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from soe.logon where logon_id=42;

---------- ----------- ---------
42       50865 12-AUG-11
42          42 02-SEP-16

Now if you list the topics defined within Kafka, you should see a new one has been created, for the SOE.LOGON table:

[oracle@bigdatalite dirrpt]$ kafka-topics --zookeeper localhost:2181 --list
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/java/kafka/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

and you can view the record:

[oracle@bigdatalite dirrpt]$ kafka-console-consumer --zookeeper localhost:2181 --topic ORCL.SOE.LOGON --from-beginning

{"schema":{"type":"struct","fields":[{"type":"string","optional":false,"field":"table"},{"type":"string","optional":false,"field":"op_type"},{"type":"string","optional":false,"field":"op_ts"},{"type":"string","optional":false,"field":"current_ts"},{"type":"string","optional":false,"field":"pos"},{"type":"double","optional":true,"field":"LOGON_ID"},{"type":"double","optional":true,"field":"CUSTOMER_ID"},{"type":"string","optional":true,"field":"LOGON_DATE"}],"optional":false,"name":"ORCL.SOE.LOGON"},"payload":{"table":"ORCL.SOE.LOGON","op_type":"I","op_ts":"2016-09-02 14:56:26.000411","current_ts":"2016-09-02 15:56:34.111000","pos":"00000000000000002010","LOGON_ID":42.0,"CUSTOMER_ID":42.0,"LOGON_DATE":"2016-09-02:15:56:25"}}

Hit Ctrl-C to cancel the consumer -- otherwise it'll sit there and wait for additional messages to be sent to the topic. Useful for monitoring when we've got lots of records flowing through, but not so useful now.

The message is JSON, so a useful tool to install is jq:

sudo yum install -y jq

You can then pipe the output of kafka-console-consumer through jq to pretty-print it:

[oracle@bigdatalite dirrpt]$ kafka-console-consumer --zookeeper localhost:2181 --topic ORCL.SOE.LOGON --from-beginning|jq '.'

  "payload": {
    "LOGON_DATE": "2016-09-02:15:56:25",
    "CUSTOMER_ID": 42,
    "LOGON_ID": 42,
    "pos": "00000000000000002010",
    "current_ts": "2016-09-02 15:56:34.111000",
    "op_ts": "2016-09-02 14:56:26.000411",
    "op_type": "I",
    "table": "ORCL.SOE.LOGON"

or even show just sections of the message using jq's syntax (explore it here):

[oracle@bigdatalite dirrpt]$ kafka-console-consumer --zookeeper localhost:2181 --topic ORCL.SOE.LOGON --from-beginning|jq '.payload.op_ts'

"2016-09-02 14:56:26.000411"

So we've got successful replication of Oracle transactions into Kafka, via Oracle GoldenGate. Now let's bring Elasticsearch into the mix.

Configuring Elasticsearch

We're going to use Elasticsearch as a destination for storing the data coming through Kafka from Oracle. Each Oracle table will map to a separate Elasticsearch index. In Elasticsearch an 'index' is roughly akin to an RDBMS table, a 'document' to a row, a 'field' to a column, and a 'mapping' to a schema.

Elasticsearch itself needs no configuration out of the box if you want to just get up and running with it, you simply execute it:


Note that this wouldn't suffice for a Production deployment, in which you'd want to allocate heap space, check open file limits, configure data paths, and so on.

With Elasticsearch running, you can then load Kopf, which is a web-based admin plugin. You'll find it at http://<server>:9200/_plugin/kopf

From Kopf you can see which nodes there are in the Elasticsearch cluster (just the one at the moment, with a random name inspired by Marvel), along with details of the indices as they're created - in the above screenshot there are none yet, because we've not loaded any data.

Setting up the Elasticsearch Kafka Connect handler

Create a configuration file for the Elasticsearch Kafka Connect handler. I've put it in with the Elasticsearch configuration itself at /opt/elasticsearch-2.4.0/config/elasticsearch-kafka-connect.properties; you can use other paths if you want.

The defaults mostly suffice to start with, but we do need to update the topics value:

# Custom config

Because Elasticsearch indices cannot be uppercase, we need to provide a mapping from the Kafka topic to the Elasticsearch index, so add a configuration to the file:


If you don't do this you'll get an InvalidIndexNameException. You also need to add

topic.key.ignore = ORCL.SOE.LOGON

Note that the global key.ignore is currently ignored if you are also overriding another topic parameter. If you don't set this flag for the topic, you'll get org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id..

Now we can run our connector. I'm setting the CLASSPATH necessary to pick up the connector itself, as well as the dependecies. I also set JMX_PORT so that the metrics are exposed on JMX for helping with debug/monitoring.

export CLASSPATH=/opt/kafka-connect-elasticsearch/*
export JMX_PORT=4243
/usr/bin/connect-standalone /etc/kafka/connect-standalone.properties /opt/elasticsearch-2.4.0/config/elasticsearch-kafka-connect.properties

You'll not get much from the console after the initial flurry of activity, except:

[pool-2-thread-1] INFO org.apache.kafka.connect.runtime.WorkerSinkTask - WorkerSinkTask{id=elasticsearch-sink-0} Committing offsets

But if you head over to Elasticsearch you should now have some data. In Kopf you'll see that there are now 'documents' in the index:

In addition the header bar of Kopf has gone a yellow/gold colour, because your Elasticsearch cluster is now in "YELLOW" state - we'll come back to this and the cause (unassigned shards) shortly.

Interactions with Elasticsearch are primarily through a REST API, which you can use to query the number of records in an index:

[oracle@bigdatalite ~]$ curl -s -X "GET" "http://localhost:9200/soe.logon/_count?pretty=true"
"count" : 4,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0

and you can pair it up with jq as above to select just one of the fields:

$ curl -s -X "GET" "http://localhost:9200/soe.logon/_count?pretty=true"|jq '.count'

To see the data itself:

$ curl -s -X "GET" "http://localhost:9200/soe.logon/_search?pretty=true"
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "soe.logon",
      "_type" : "kafka-connect",
      "_id" : "ORCL.SOE.LOGON+0+3",
      "_score" : 1.0,
      "_source" : {
    "table" : "ORCL.SOE.LOGON",
    "op_type" : "I",
    "op_ts" : "2016-09-05 14:46:16.000436",
    "current_ts" : "2016-09-05 15:46:21.860000",
    "pos" : "00000000000000002748",
    "LOGON_ID" : 42.0,
    "CUSTOMER_ID" : 42.0,
    "LOGON_DATE" : "2016-09-05:15:46:11"

This is looking good! But ... there's a wrinkle. Let's fire up Kibana, an analytical tool for data in Elasticsearch, and see why.


Go to http://<server>:5601/ and the first thing you'll see (assuming this is the first time you've run Kibana) is this:

Elasticseach, Index Mappings, and Dynamic Templates

Kibana is a pretty free-form analysis tool, and you don't have to write SQL, define dimensions, and so on -- but what you do have to do is tell it where to find the data. So let's specify our index name, which in this example is soe.logon:

Note that the Time-field name remains blank. If you untick Index contains time-based events and then click Create you'll see the index fields and their types:

Columns that are timestamps are coming across as strings - which is an issue here, because Time is one of the dimensions by which we'll pretty much always want to analyse data, and if it's not present Kibana (or any other user of the Elasticsearch data) can't do its clever time-based filtering and aggregation, such as this example taken from another (time-based) Elasticsearch index:

As a side note, the schema coming through from OGG Kafka Connect connector is listing these timestamp fields as strings, as we can see with a bit of fancy jq processing to show the schema entry for one of the fields (op_ts):

$ kafka-console-consumer --zookeeper localhost:2181 --topic ORCL.SOE.LOGON --from-beginning --max-messages 1|jq '.schema.fields[] | select (.field | contains("op_ts"))'
    "field": "op_ts",
    "optional": false,
    "type": "string"

This string-based schema is actually coming through from the OGG replicat itself - whilst the Kafka Connect handler interprets and assumes the datatypes of columns such as numbers, it doesn't for timestamps.

So - how do we fix these data types in Elasticsearch so that we can make good use of the data? Enter Dynamic Templates. These enable you to specify the mapping (similar to a schema) of an index prior to it being created for a field for the first time, and you can wildcard field names too so that, for example, anything with a _ts suffix is treated as a timestamp data type.

To configure the dynamic template we'll use the REST API again, and whilst curl is fine for simple and repeated command line work, we'll switch to the web-based Elasticsearch REST API client, Sense. Assuming that you installed it following the process above you can access it at http://<server>:5601/app/sense.

Click Get to work to close the intro banner, and in the main editor paste the following JSON (gist here)

DELETE /_template/kafkaconnect/
PUT /_template/kafkaconnect/
  "template": "soe*",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  "mappings": {
    "_default_": {
      "dynamic_templates": [
          "dates": {
            "match": "*_ts",
            "mapping": {
              "type": "date",
              "format": "YYYY-MM-dd HH:mm:ss.SSSSSS"
          "non_analysed_string_template": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed"

What this is doing:

  1. The DELETE is just there so that you can re-run these statements, since Elasticsearch won't update an existing template.
  2. Any index beginning with soe will be matched against this template.
  3. This is based on a single-node Elasticsearch instance, so setting the number of replicas to zero, and shards to one. In a multi-node Production cluster you'd want to set these differently. If you leave replicas as the default (1) then your Elasticsearch cluster will remain in "YELLOW" health status as there'll forever be unassigned shards.
  4. The dates template matches any field with _ts suffix and sets it to a Date type. The inbound data must match the format shown. For details of the date format specifics, see the JodaTime documentation.
  5. The non_analysed_string_template template matches any string field and creates two instances of it; one analyzed and one not. Analyzed is where it gets tokenized which is useful for full-text searching etc, and non-analyzed is necessary for aggregations against the full field value. For example, "New York" would otherwise aggregate as 'New' and a separate instance 'York'.

Put the cursor over each statement and click the green play arrow that appears to the right of the column.

For the DELETE statement you'll get an error the first time it's run (because the index template isn't there to delete), and the PUT should succeed with

    "acknowledged": true

Now we'll delete the index itself so that it can be recreated and pick up the dynamic mappings. Here I'm using curl but you can run this in Sense too if you want.

$ curl -X "DELETE" "http://localhost:9200/soe.logon"

Watch out here, because Elasticsearch will delete an index before you can say 'oh sh....' -- there is no "Are you sure you want to drop this index?" type interaction. You can even wildcard the above REST request for real destruction and mayhem - action.destructive_requires_name can be set to limit this risk.

So, to recap - we've successfully run Kafka Connect to load data from a Kafka topic into an Elasticsearch index. We've taken that index and seen that the field mappings aren't great for timestamp fields, so have defined a dynamic template in Elasticsearch so that new indices created will set any column ending _ts to a timestamp. Finally, we deleted the existing index so that we can use the new template from now on.

Let's test out the new index mapping. Since we deleted the index that had our data in (albeit test data) we can take advantage of the awesomeness that is Kafka by simply replaying the topic from the start. To do this change the name value in the Elasticsearch connector configuration (elasticsearch-kafka-connect.properties), e.g. add a number to its suffix:


If you're running Kafka Connect in standalone mode then you could also just delete the offsets file to achieve the same.

Whilst in the configuration file you need to also add another entry, telling the connector to ignore the schema that is passed from Kafka and instead dynamically infer the types (as well as honour the dynamic mappings that we specified)


Now restart the connector (make sure you did delete the Elasticsearch index per above, otherwise you'll see no difference in the mappings)

export CLASSPATH=/opt/kafka-connect-elasticsearch/*
export JMX_PORT=4243
/usr/bin/connect-standalone /etc/kafka/connect-standalone.properties /opt/elasticsearch-2.4.0/config/elasticsearch-kafka-connect.properties

And go check out Elasticsearch, first the mapping:

$ curl -X "GET" "http://localhost:9200/soe.logon/_mapping?pretty"

      "soe.logon" : {
        "mappings" : {
              "LOGON_ID" : {
                "type" : "double"
              "current_ts" : {
                "type" : "date",
                "format" : "YYYY-MM-dd HH:mm:ss.SSSSSS"
              "op_ts" : {
                "type" : "date",
                "format" : "YYYY-MM-dd HH:mm:ss.SSSSSS"

Note that the two timestamp columns are now date type. If you still see them as strings, make sure you've set the topic.schema.ignore configuration as shown above in the Kafka Connect properties for the Elasticsearch connector.

Looking at the row count, we can see that all the records from the topic have been successfully replayed from Kafka and loaded into Elasticsearch. This ability to replay data on demand whilst developing and testing the ingest into a subsequent pipeline is a massive benefit of using Kafka!

$ curl -s -X "GET" "http://localhost:9200/soe.logon/_count?pretty=true"|jq '.count'

Over in Kibana head to the Index Patterns setting page (http://<server>:5601/app/kibana#/settings/indices), or from the Settings -> Indices menu buttons at the top. If you already have the index defined here then delete it - we want Kibana to pick up the new shiny version we've created because it includes the timestamp columns. Now configure a new index pattern:

Note that the Time-field name field is now populated. I've selected op_ts. Click on Create and then go to the Discover page (from the option at the top of the page). You may well see "No results found" - if so use the button in the top-right of the page to change the time window to broaden it to include the time at which you inserted record(s) to the SOE.LOGON table in the testing above.

To explore the data further you can click on the add button that you get when hovering over each of the fields on the left of the page, which will add them as columns to the main table, replacing the default _source (which shows all fields):

In this example you can see that there was quite a few testing records inserted (op_type = I), with nothing changing between than the LOGON_DATE.

Connector errors after adding dynamic templates

Note that if you get an error like this when running the connector:

[pool-2-thread-1] ERROR org.apache.kafka.connect.runtime.WorkerSinkTask - Task elasticsearch-sink-02-0 threw an uncaught and unrecoverable exception
org.apache.kafka.connect.errors.ConnectException: Cannot create mapping:{"kafka-connect":{"properties":{"[...]
at io.confluent.connect.elasticsearch.Mapping.createMapping(Mapping.java:65)

then check the Elasticsearch log/stdout, where you'll find more details. This kind of thing that can cause problems would be an index not deleted before re-running it with the new template, as well as a date format in the template that doesn't match the data.

Running a Full Swingbench Test Configuration

If you've made it this far, congratulations! Now we're going to set up the necessary configuration to run Swingbench. This will generate a stream of changes to multiple tables, enabling us to get a feel for how the pipeline behaves in 'real world' conditions.

To start will, let's get a list of all the tables involved:

$ rlwrap sqlplus soe/soe@orcl

SQL*Plus: Release Production on Tue Sep 6 11:45:02 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Fri Sep 02 2016 15:49:03 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select table_name from user_tables;


11 rows selected.


The OGG replication is already defined with a wildcard, to pick up all tables in the SOE schema:

[oracle@bigdatalite config]$ cat /u01/ogg/dirprm/ext1.prm
EXTTRAIL ./dirdat/lt

[oracle@bigdatalite config]$ cat /u01/ogg-bd/dirprm/rconf.prm
TARGETDB LIBFILE libggjava.so SET property=dirprm/conf.props
MAP *.*.*, TARGET *.*.*;

The OGG Kafka Connect handler will automatically create a topic for every table that it receives from OGG. So all we need to do now is add each table to the Elasticsearch Sink configuration. For this, I created a second version of the configuration file, at /opt/elasticsearch-2.4.0/config/elasticsearch-kafka-connect-full.properties

topic.index.map=ORCL.SOE.WAREHOUSES:soe.warehouses ,ORCL.SOE.PRODUCT_INFORMATION:soe.product_information ,ORCL.SOE.PRODUCT_DESCRIPTIONS:soe.product_descriptions ,ORCL.SOE.ORDER_ITEMS:soe.order_items ,ORCL.SOE.ORDERS:soe.orders ,ORCL.SOE.ORDERENTRY_METADATA:soe.orderentry_metadata ,ORCL.SOE.LOGON:soe.logon ,ORCL.SOE.INVENTORIES:soe.inventories ,ORCL.SOE.CUSTOMERS:soe.customers ,ORCL.SOE.CARD_DETAILS:soe.card_details ,ORCL.SOE.ADDRESSES:soe.addresses

Having created the configuration, run the connector. If the previous connector from the earlier testing is running then stop it first, otherwise you'll get a port clash (and be double-processing the ORCL.SOE.LOGON topic).

/usr/bin/connect-standalone /etc/kafka/connect-standalone.properties /opt/elasticsearch-2.4.0/config/elasticsearch-kafka-connect-full.properties
Running Swingbench

I'm using charbench which is a commandline interface for Swingbench:

$ /opt/swingbench/bin/charbench -cs localhost:1521/orcl -u soe -p soe -v trans,users
Author  :        Dominic Giles
Version :

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            NCR     UCD     BP      OP      PO      BO      SQ      WQ      WA      Users
06:59:14        0       0       0       0       0       0       0       0       0       [0/1]
06:59:15        0       0       0       0       0       0       0       0       0       [0/1]
06:59:16        0       0       0       0       0       0       0       0       0       [0/1]
06:59:17        0       0       0       0       0       0       0       0       0       [0/1]
06:59:18        0       0       0       0       0       0       0       0       0       [0/1]
06:59:19        0       0       0       0       0       0       0       0       0       [0/1]
06:59:20        0       0       0       0       0       0       0       0       0       [0/1]
06:59:21        0       0       0       0       0       0       0       0       0       [0/1]
06:59:22        0       0       0       0       0       0       0       0       0       [1/1]
06:59:23        2       0       2       0       0       0       0       0       0       [1/1]
06:59:24        3       0       4       5       0       0       0       0       0       [1/1]

Each of the columns with abbreviated headings are different transactions run, and as soon as you see numbers above zero in them it indicates that you should be getting data in the Oracle tables, and thus through into Kafka and Elasticsearch.

Auditing the Pipeline

Let's see how many records are on the ORDERS table:

$ rlwrap sqlplus soe/soe@orcl

SQL*Plus: Release Production on Tue Sep 6 12:21:13 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Tue Sep 06 2016 12:21:09 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select count(*) from orders;


But, this includes the records that were pre-seeded by Swingbench before we set up the OGG extract. How do we know how many have been read by GoldenGate since, and should therefore be downstream on Kafka, and Elasticsearch? Enter logdump. This is a GoldenGate tool that gives a commandline interface to analysing the OGG trail file itself. You can read more about it here, here, and here.

First, determine the trail file name:

$ ls -l /u01/ogg/dirdat/*
-rw-r-----. 1 oracle oinstall 64068 Sep  9 10:16 /u01/ogg/dirdat/lt000000015

And then launch logdump (optionally, but preferably, with rlwrap to give command history and search):

$ cd /u01/ogg/
$ rlwrap ./logdump

From the Logdump > prompt, open the trail file:

Logdump 1 >OPEN /u01/ogg/dirdat/lt000000015
Current LogTrail is /u01/ogg/dirdat/lt000000015

and then filter to only show records relating to the table we're interested in:


and then give a summary of the records present:

Logdump 3 >COUNT
LogTrail /u01/ogg/dirdat/lt000000015 has 46 records
Total Data Bytes              14056
  Avg Bytes/Record              305
Insert                           22
Update                           23
Metadata Records                  1
After Images                     45
Filtering matched           46 records
          suppressed       208 records

Here we can see that there are a total of 45 insert/update records that have been captured.

Let's check the replicat's trail file also matches:

$ ls -l /u01/ogg-bd/dirdat/*
-rw-r-----. 1 oracle oinstall 64416 Sep  9 10:16 /u01/ogg-bd/dirdat/rt000000000

$ cd /u01/ogg-bd
$ rlwrap ./logdump

Oracle GoldenGate Log File Dump Utility
Version OGGCORE_12.

Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved.

Logdump 1 >OPEN /u01/ogg-bd/dirdat/rt000000000
Current LogTrail is /u01/ogg-bd/dirdat/rt000000000
Logdump 3 >COUNT
LogTrail /u01/ogg-bd/dirdat/rt000000000 has 46 records
Total Data Bytes              14056
  Avg Bytes/Record              305
Insert                           22
Update                           23
Metadata Records                  1
After Images                     45
Filtering matched           46 records
          suppressed       213 records

Average of 3 Transactions
Bytes/Trans .....       5421
Records/Trans ...         15
Files/Trans .....          4

Looks good - a total of 45 records again.

So from OGG, the data flows via the Kafka Connect connect into a Kafka topic, one per table. We can count how many messages there are on the corresponding Kafka topic by running a console consumer, redirecting the messages to file (and using & to return control to the console):

$ kafka-console-consumer --zookeeper localhost:2181 --topic ORCL.SOE.ORDERS --from-beginning > /tmp/kafka_orcl.soe.orders &

and then issue a wc to count the number of lines currently in the resulting file:

$  wc -l /tmp/kafka_orcl.soe.orders
45 /tmp/kafka_orcl.soe.orders

Since the console consumer process is still running in the background (type fg to bring it back to the foreground if you want to cancel it), you can re-issue the wc as required to see the current count of messages on the topic.

Finally, to see the number of documents on the corresponding Elasticsearch index:

$ curl -s -X "GET" "http://localhost:9200/soe.orders/_count?pretty=true"|jq '.count'

Here we've proved that the number of records written by Oracle are making it all the way through our pipeline.

Monitoring the Pipeline

Kafka and Kafka Connect expose metrics through JMX. There's a variety of tools for capturing, persisting, and visualising this, such as detailed here. For now, we'll just use JConsole to inspect the metrics and get an idea of what's available.

You'll need a GUI for jconsole, so either a desktop session on the server itself, X11 forwarded, or you can also run JConsole from a local machine (it's bundled with any JDK) and connect to the remote JMX. In this example I simply connected to the VM's desktop and ran JConsole locally there. You launch it by running it from the shell prompt:

$ jconsole

From here I connected to the 'Remote Process' on localhost:4242 to access the Kafka server process (because it's running as root the jconsole process (running as a non-root user) can't connect to it as a 'Local Process'). The port 4242 is what I specified as an environment variable as part of the kafka process launch.

On the MBeans tab there are a list of MBeans under which the bespoke application metrics (as opposed to JVM ones like heap memory usage) are found. For example, the rate at which data is being received and sent from the cluster:

By default when you see an attribute for an MBean is it point-in-time - doubleclick on it to make it a chart that then tracks subsequent changes to the number.

By connecting to localhost:4243 (press Ctrl-N for a new connection in the same JConsole instance) you can inspect the metrics from the Kafka Connect elasticsearch sink

You can also access JMX metrics for the OGG Kafka handler by connecting to the local processs (assuming you're running JConsole locally). To find the PID for the RCONF replicat, run:

$ pgrep -f RCONF

Then select that PID from the JConsole connection list - note that the process name may show as blank.

The producer stats show metrics such as the rate at which topic is being written to:


In this article we've seen how stream transactions from a RDBMS such as Oracle into Kafka and out to a target such as Elasticsearch, utilising the Kafka Connect platform and its standardised connector framework. We also saw how to validate and audit the pipeline at various touchpoints, as well as a quick look at accessing the JMX metrics that Kafka provides.

This article was also posted on the Confluent blog, head over there for more great Kafka-related content!

Categories: BI & Warehousing

OTN Appreciation Day: - the day after - BPEL, SOASuite and SCA (in that order...)

Darwin IT - Wed, 2016-10-12 07:21
Unfortunately I noticed this nice initiative only yesterday: OTN Appreciation Day. I did not had a change to cook something up, but I do like to add some mustard after the meal, as we say in Dutch.

In the titles of the 'OTN Appreciation Day'-blogs I miss BPEL. In 2004,when Oracle acquired Collaxa, I worked at Oracle Consulting in the Netherlands. I worked with Oracle Workflow and Interconnect. Oracle wasn't yet into SOA really. But with BPEL PM they acquired the tool that stood at the base of SOA Suite, together with Webservices Manager and what we now know as the Mediator. And it changed my professional life, really. Its extremely powerful, especially with the added  JCA Adapters, xslt-mapper, and since 11g the SCA architecture wich enables you to assemble composite applications with Adapters BusinessRules, Human Workflow, Mediator, Spring Components and of course ... BPEL 2.0.

Sorry, Tim, for being late.

Is it possible to find out the problem SQL in a procedure which was executed days ago

Tom Kyte - Wed, 2016-10-12 06:26
Hi Team, Our system was written by plenty a lot of procedures, and one of them did not performance well. This procedure includes tens of SQL statments, suppose that there's only 1 or 2 SQL statements in the procedure caused this problem. I mean, th...
Categories: DBA Blogs


Tom Kyte - Wed, 2016-10-12 06:26
CLSN00107: CRS-5017: L'action de ressource "ora.dbfor.db start" a rencontre l'erreur suivante : ORA-12546: TNS : acces refuse . Pour plus de details, reportez-vous a "(:CLSN00107:)" dans "/exec/products/oracle/grid/log/servfor/agent/ohasd/oraagent_...
Categories: DBA Blogs

Why was my job not running?

Tom Kyte - Wed, 2016-10-12 06:26
Dear all, I have tried to create a job (wizard) in Oracle SQL Developer 4.0 as follows: ----------SUMMARY------------------ Job Name - TEST Enabled - true Description - Job Class - null Type of Job - PL/SQL Block When to Execute Job - IM...
Categories: DBA Blogs


Subscribe to Oracle FAQ aggregator