Amardeep Sidhu

Subscribe to Amardeep Sidhu feed
Little bit of fun with Oracle and the related technologies...
Updated: 1 hour 47 min ago

TNS-12543: TNS:destination host unreachable

Fri, 2017-07-14 23:53

Scenario : Setting up a physical standby from Exadata to a non-Exadata single instance. tnsping from standby to primary works fine but tnsping from primary to standby fails with:

TNS-12543: TNS:destination host unreachable

I am able to ssh standby from primary, can ping as well but tnsping doesn’t work.  From the error description we can figure out that something is blocking the access. In this case it was iptables that was enabled on the standby server.

Stopping the service resolved the issue.

service iptables stop
chkconfig iptables off

The error is an obvious one but sometimes it just doesn’t strike you that it could be something simple like that.

Categories: BI & Warehousing

ORA-12154 in Data Guard environment

Wed, 2017-05-31 10:54

Hit this silly issue in one of the data guard environments today. Primary is a 2 node RAC running 11.2.0.4 and standby is also a 2 node RAC. Archive logs from node2 aren’t shipping and the error being reported is

ORA-12154: TNS:could not resolve the connect identifier specified

We tried usual things like going to $TNS_ADMIN, checking the entry in tnsnames.ora and then also trying to connect using sqlplus sys@target as sysdba. Everything seemed to be good but logs were not shipping and the same problem was being reported repeatedly. As everything on node1 was working fine so it looked even more weird.

From the error it is clear that the issue is with tnsnames entry. Finally found the issue after some 30 mins. It was an Oracle EBS environment so the TNS_ADMIN was set to the standard $ORACLE_HOME/network/admin/*hostname* path (on both the nodes). On node1 there was no tnsnames.ora file in $ORACLE_HOME/network/admin so it was connecting to the standby using the Apps tnsnames.ora which was having the correct entry for standby. On node2 there was a file called tnsnames.ora in $ORACLE_HOME/network/admin but it was not having any entry for standby. It was trying to connect using that file (the default tns path) and failing with ORA-12154. Once we removed that file, it started using the Apps tnsnames.ora and logs started shipping.

Categories: BI & Warehousing

Failed to create voting files on disk group RECOC1

Fri, 2017-04-28 04:01

Long story short, faced this issue while running OneCommand for one Exadata system. The root.sh step (Initialize Cluster Software) was failing with the following error on the screen

Checking file root_dm01dbadm02.in.oracle.com_2017-04-27_18-13-27.log on node dm01dbadm02.somedomain.com
Error: Error running root scripts, please investigate…
Collecting diagnostics…
Errors occurred. Send /u01/onecommand/linux-x64/WorkDir/Diag-170427_181710.zip to Oracle to receive assistance.

Doesn’t make much sense. So let us check the log file of this step

2017-04-27 18:17:10,463 [INFO][  OCMDThread][        ClusterUtils:413] Checking file root_dm01dbadm02.somedomain.com_2017-04-27_18-13-27.log on node inx321dbadm02.somedomain.com
2017-04-27 18:17:10,464 [INFO][  OCMDThread][        OcmdException:62] Error: Error running root scripts, please investigate…
2017-04-27 18:17:10,464 [FINE][  OCMDThread][        OcmdException:63] Throwing OcmdException… message:Error running root scripts, please investigate…

So we need to go to root.sh log file now. That shows

Failed to create voting files on disk group RECOC1.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
Voting file add failed
2017/04/27 18:16:37 CLSRSC-261: Failed to add voting disks

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 2068.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/root
crs.pl ‘ execution failed

Makes some senses but we can’t understand what happened while creating Voting files on RECOC1. Let us check ASM alert log also

NOTE: Creating voting files in diskgroup RECOC1
Thu Apr 27 18:16:36 2017
NOTE: Voting File refresh pending for group 1/0x39368071 (RECOC1)
Thu Apr 27 18:16:36 2017
NOTE: Attempting voting file creation in diskgroup RECOC1
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM03
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM03
ERROR: Voting file allocation failed for group RECOC1
Thu Apr 27 18:16:36 2017
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_228588.trc:
ORA-15274: Not enough failgroups (5) to create voting files

So we can see the issue here. We can look at the above trace file also for more detail.

Now to why did this happen ?

The RECOC1 is a HIGH redundancy disk group which means that if we want to place Voting files there, it should have at least 5 failure groups. In this configuration there are only 3 cells and that doesn’t meet the minimum failure groups condition (1 cell = 1 failgroup in Exadata).

Now to how did it happen ?

This one was an Exadata X3 half rack and we planned to deploy it (for testing purpose) as 2 quarter racks : 1st cluster with db1, db2 + cell1, cell2, cell3 and 2nd cluster with db3, db4 + cell4, cell5, cell6, cell7. All the disk groups to be in High redundancy.

Before a certain 12.x Exadata software version it was not even possible to have all disk groups in High redundancy in a quarter rack as to have Voting disk in a High redundancy disk group you need to have a minimum of 5 failure groups (as mentioned above). In a quarter rack you have only 3 fail groups. With a certain 12.x Exadata software version a new feature quorum disks was introduced which made is possible to have that configuration. Read this link for more details. Basically we take a slice of disk from each DB node and add it to the disk group where you want to have the Voting file. 3 cells + 2 disks from DB nodes makes it 5 so all is good.

Now while starting with the deployment we noticed that db node1 was having some hardware issues. As we needed the machine for testing so we decided to build the first cluster with 1 db node only. So the final configuration of 1st cluster had 1 db node + 3 cells. We imported the XML back in OEDA, modified the cluster 1 configuration to 1 db node and generated the configuration files. That is where the problem started. The RECO disk group still was High redundancy but as we had only 1 db node at this stage so the configuration was not even a candidate for quorum disks. Hence the above error. Changing DBFS_DG to Normal redundancy fixed the issue as when DBFS_DG is Normal redundancy, OneCommand will place the Voting files there.

Ideally it shouldn’t happened as OEDA shouldn’t allow a configuration that is not doable. The case here is that as originally the configuration was having 2 db nodes + 3 cells so High redundancy for all disk groups was allowed in OEDA. While modifying the configuration when one db node was removed from the cluster, OEDA probably didn’t run the redundancy check on disk groups and it allowed the go past that screen. If you try to create a new configuration with 1 db node + 3 cells, it will not allow you to choose High redundancy for all disk groups. DBFS will remain in Normal redundancy. You can’t change that.

Categories: BI & Warehousing

OneCommand Step 1 error

Mon, 2017-04-10 11:50

Hit this silly issue while doing an Exadata deployment for a customer. Step 1 was giving the following error:

ERROR: 192.168.99.102 configured on dm01celadm01.example.com as dm01dbadm02 does not match expected value dm01dbadm02.example.com

I wasn’t able to make sense of it for quite some time until a colleague pointed out that the reverse lookup entries should be done for FQDN only. As it is clear in the above message reverse lookup of the IP 192.168.99.102 returns dm01dbadm02 instead of dm01dbadm02.example.com. Fixing this in DNS resolved the issue.

Actually the customer had done reverse lookup entries for both the hostname and FQDN. As the DNS can return the results in any order, so the error message was bit random. Whenever the the hostname was returned first, Step 1 gave an error. But when the FQDN was the first thing returned, there was no error in Step 1 for that IP.

Categories: BI & Warehousing

Oracle RAC 12.1 – lsnodes exited with code 9

Tue, 2017-03-28 11:31

I was trying to do a 2 node RAC setup on Solaris 11.3 where Oracle Solaris Cluster 4.3 was already configured. Installed was running but the Cluster Node Information screen was appearing like this

error

The install log shows this:

INFO: Checking cluster configuration details

INFO: Found Vendor Clusterware. Fetching Cluster Configuration

INFO: Executing [/tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes]

with environment variables {TERM=xterm, LC_COLLATE=, SHLVL=3, JAVA_HOME=, XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, SSH_CLIENT=172.16.64.55 56370 22, LC_NUMERIC=, LC_MESSAGES=, MAIL=/var/mail/oracle, PWD=/export/software/grid/grid, XTERM_VERSION=XTerm(320), WINDOWID=2097165, LOGNAME=oracle, _=*50727*/export/software/grid/grid/install/.oui, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, SSH_CONNECTION=172.16.64.55 56370 172.16.72.18 22, OLDPWD=/export/oracle, LC_CTYPE=, CLASSPATH=, PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install, LC_ALL=, DISPLAY=localhost:10.0, LC_MONETARY=, USER=oracle, HOME=/export/oracle, XTERM_SHELL=/bin/bash, XAUTHORITY=/tmp/ssh-xauth-mlq21a/xauthfile, A__z=”*SHLVL, XTERM_LOCALE=en_US.UTF-8, TZ=localtime, LC_TIME=, LANG=en_US.UTF-8}

INFO: Starting Output Reader Threads for process /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes

INFO: The process /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes exited with code 9

So we can see the problem. lsnodes is not able to list the nodes. Let us try to run that command manually.

-bash-4.1$ export PATH=PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install

-bash-4.1$ /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes

ld.so.1: lsnodes: fatal: libskgxn2.so: open failed: No such file or directory

Killed

-bash-4.1$

So looks like it is not able to find this library called libskgxn2.so. If we do a find for this file name we can see that it is present in this directory /usr/cluster/lib/sparcv9/libskgxn2.so .

Some googling and MOS searches revealed that it expects the library to be present at /opt/ORCLcluster/lib. This directory doesn’t exist here. As a workaround we can create this directory manually and create symbolic link to file libskgxn2.so

The lsnodes command worked fine after this workaround and installer also shows both the nodes listed.

Categories: BI & Warehousing

ERROR: SPFile in diskgroup does not match the specified spfile

Tue, 2016-09-20 11:42

Just a stupid error. Posting it so that someone else googling for the same thing can get a clue.

An ASM instance running with default parameters (no pfile, no spfile). Updated spfile for the instance with asmcmd spset command and bounced crs. After reboot also, it still wasn’t using spfile. Got puzzled and checked GPnP settings again. All looked good. Then in alert log came across this

ERROR: SPFile in diskgroup <> does not match the specified spfile +DATA/asm/asmparameterfile/registry.253.769187275

The problem was that while copying the spfile path the complete name didn’t get copied. The last character got missed. So the filename that it was looking for wasn’t there. Updating GPnP with correct filename and bouncing crs resolved the issue.

Categories: BI & Warehousing

addNode.sh, failed root.sh and IB listener

Tue, 2016-09-13 09:14

So this customer has an Exadata quarter rack and they have an IB listener configured on both DB nodes (for DB connections from a multi-racked Exalogic system). We were adding a new DB node to this rack. So just followed the standard procedure of creating users, directories etc on the new node, setting up ssh equivalence and running addNode.sh. All went fine but root.sh failed. Little looking into the logs revealed that it failed while running srvctl start listener –n <node_name>

If we manually run this command, it will immediately reveal what the problem is. It is not able to start IB listener on the new node as the IB VIP doesn’t yet exist. It could happen for any of the additional networks added.

There is a MOS note that describes this exact situation but the solution that it gives is to remove the additional listener, complete addNode.sh & root.sh and add the additional listener back. That wasn’t possible in this case. After little bit of googling I stumbled upon this post by Jeremy Schneider. His colleague solved this problem with a very simple and clever workaround. Before root.sh prepares to run srvctl start listener command, run the add VIP command from another Window Winking smile. Additional network would have already got added when root.sh runs on the new node.

To be able to perform this trick, you have to have the hosts file updated with the new VIP name and IP and be ready with the command to add the VIP. While root.sh is running, it will show a message like “there is already an active cluster, restarting to join”, immediately start trying to run srvctl add vip command in another window. The moment CRS, comes up the command will succeed. Immediately after that root.sh is going to run srvctl start listener command, and this time it shouldn’t fail as the VIP is already added.

Another small mistake we made was not updating the cellip.ora on the new node before running root.sh. That caused the root.sh to fail as it couldn’t talk to ASM running on existing cell nodes. Updating cellip.ora with the existing storage node IPs fixed the problem.

Categories: BI & Warehousing

OEDA–Things to keep an eye on

Thu, 2016-09-08 05:14

So if you are filling an OEDA for Exadata deployment there are few things you should take care of. Most of the screens are self explanatory but there are some bits where one should focus little more. I am running the Aug version of it and the screenshots below are from that version.

  1. On the Define customer networks screen, the client network is the actual network where your data is going to flow. So typically it is going to be bonded (for high availability) and depending upon the network in your data center you have to select one out of 1/10 G copper and 10 G optical.

    image

  2. If you are going to use trunk VLANs for your client network, remember to enabled it by clicking the Advanced button and then entering the relevant VLAN id.
  3. image

    Also if it is going to be an OVM configuration, you may want to have different VMs in different VLAN segments. It will allow you to change VLAN ids for individual VMs on the respective cluster screens like below

    image

  4. If all the cores aren’t licensed remember to enable Capacity on Demand (COD) on the Identify Compute node OS screen.
  5. image

  6. On the Define clusters screen make sure that you enter a unique (across your environment) cluster name.

    image

  7. The cluster details screen captures some of the most important details like
    1. Whether you want to have flash cache in WriteBack mode instead of WriteThrough
    2. Whether you want to have a role separated install or want to install both GI and Oracle binaries with oracle user itself.
    3. GI & Database versions and home for binaries. Always good to leave it at the Oracle recommended values as that makes the future maintenance easy and less painful.
    4. Disk Group names, redundancy and the space allocation.
    5. Default database name and type (OLTP or DW).

      image

Of course it is important to carefully fill the information in all the screens but the above ones are some of them which should be filled very carefully after capturing the required information from other teams, if needed.

Categories: BI & Warehousing

ORA-56841: Master Diskmon cannot connect to a CELL

Thu, 2016-05-19 11:45

Faced this error while querying v$asm_disk after adding new storage cell IPs to cellip.ora on DB nodes of an existing cluster on Exadata. Query ends with ORA-03113 end-of-file on communication channel and ORA-56841 is reported in $ORA_CRS_HOME/log/<hostname>/diskmon/diskmon.log. Reason in my case was that the new cell was using different subnet for IB. It was pingable from the db nodes but querying v$asm_disk wasn’t working. Changing the subnet for IB on new cell to the one on existing cells fixed the issue.

 

 

Categories: BI & Warehousing

ORA-01671 while creating cascaded standby from standby using RMAN DUPLICATE

Sat, 2015-12-19 09:20

On a T5 Super Cluster (running 11.2.0.3) I was creating a cascaded standby from an already functional standby using RMAN DUPLICATE and it errored out with

ORA-01671: control file is a backup, cannot make a standby control file

 

A quick search reveals that it is bug 11715084 that affects most of the 11.x versions except 11.2.0.4. There is a one off patch available for most of the versions or one can install the bundle patch that includes the fix for this patch. I applied BP26 and it worked fine after that.

Categories: BI & Warehousing

MRP process on standby stops with ORA-00600

Thu, 2015-08-20 04:03

A rather not so great post about an ORA-00600 error i faced on a standby database. Environement was 11.2.0.3 on Sun Super Cluster machine. MRP process was hitting ORA-00600 while trying to apply a specific archive log.

The error message was something like this

MRP0: Background Media Recovery terminated with error 600
Errors in file /u01/app/oracle/product/11.2.0.3/diag/diag/rdbms/xxxprd/xxxprd1/trace/xxxprd1_pr00_6342.trc:
ORA-00600: internal error code, arguments: [2619], [539], [], [], [], [], [], [], [], [], [], []
Recovery interrupted!

Some googling and MOS searches revealed that the error was due to corrupted archive log file. Recopying the archive file from primary and restarting the recovery resolved the issue. The fist argument of the ORA-600 is actually the sequence no of the archive it is trying to apply.

Categories: BI & Warehousing

Writing tips

Tue, 2015-05-19 04:14

Tim Hall has written some brilliant posts about getting going with writing (blogs, whitepapers etc). This post is the result of inspiration from there only. Tim says that just get started with whatever Winking smile.

If you are into blogging and no so active or even if you aren’t you may want to take a look at all the posts to get some inspiration to document the knowledge you gain on day to day basis.

Here is an index to all the posts by Tim till now

http://oracle-base.com/blog/2015/05/11/writing-tips-why-should-i-bother/

http://oracle-base.com/blog/2015/05/12/writing-tips-how-do-i-start/

http://oracle-base.com/blog/2015/05/13/writing-tips-writing-style/

http://oracle-base.com/blog/2015/05/14/writing-tips-how-do-i-stay-motivated/

http://oracle-base.com/blog/2015/05/15/writing-tips-dealing-with-comments-and-criticism/

http://oracle-base.com/blog/2015/05/18/writing-tips-should-i-go-back-and-rewrite-revise-remove-old-posts/

http://oracle-base.com/blog/2015/05/19/writing-tips-how-often-should-i-write/

Enjoy !

Categories: BI & Warehousing

Want to learn Exadata ?

Fri, 2015-01-02 03:19

Many people have asked me this question that how they can learn Exadata ? It starts sounding even more difficult as a lot of people don’t have access to Exadata environments. So thought about writing a small post on the same.

It actually is not as difficult as it sounds. There are a lot of really good resources available from where you can learn about Exadata architecture and the things that work differently from any non-Exadata platform. You might be able to do lot more RnD if you have got access to an Exadata environment but don’t worry if you haven’t. Without that also there is a lot that you can explore. So here we go:

  1. I think the best reference that one can start with is Expert Oracle Exadata book by Tanel Poder, Kerry Osborne and Randy Johnson. As a traditional book covers the subject topic by topic from ground up so it makes a fun read. This book is also no different. It will teach you a lot. They are already working on the second edition. (See here).
  2. Next you can jump to whitepapers on Oracle website Exadata page, blog posts (keep an eye on OraNA.info) and whitepapers written by other folks. There is a lot of useful material out there. You just need to Google a bit.
  3. Exadata documentation (not public yet) should be your next stop if you have got access to it. Patch 10386736 on MOS if you have got the access.
  4. Try to attend an Oracle Users Group conference if there is one happening in your area. Most likely someone would be presenting on Exadata so you can use that opportunity to learn about it. Also you will get a chance to ask him questions.
  5. Lastly if you have an Exadata machine available do all the RnD you can.

Happy New Year and Happy Learning !

Categories: BI & Warehousing

VirtualBox and Windows driver verifier

Wed, 2014-12-03 04:34

I was troubleshooting some Windows hangs on my Desktop system running Windows 8 and enabled driver verifier. Today when I tried to start VirtualBox it failed with this error message.

Failed to load VMMR0.r0 (VERR_LDR_MISMATCH_NATIVE)

Most of the online forums were asking to reinstall VirtualBox to fix the issue. But one of the thread mentioned that it was being caused by Windows Driver Verifier. I disabled it, restarted Windows and VirtualBox worked like a charm. Didn’t have time to do more research as i quickly wanted to test something. May be we can skip some particular stuff from Driver Verifier and VirutalBox can then work.

Categories: BI & Warehousing

Oracle GoldenGate 11g Handbook

Thu, 2013-07-18 10:02

Few months ago I contributed a chapter (on Monitoring, Troubleshooting and Performance tuning) to a GoldenGate book on Oracle Press that Robert Freeman was authoring. Thought of posting a small update that the book is now out. My name doesn’t appear on the main page Sad smile but you will see it in the Acknowledgements section Winking smile Below is a screenshot taken from Amazon preview Smile.

You may want to grab a copy if you are using/planning to use Oracle GoldenGate 11g.

Here is the link to the book page on Amazon. It seems the book is not published in India yet but one can order the imported edition on amazon.in

image

Categories: BI & Warehousing

Oracle database 12c

Wed, 2013-06-26 20:58

So there is a new toy in the market for database geeks : Oracle has released database 12c. Every social platform is abuzz with the 12c activity. So thought that I should also complete the ritual Winking smile

In this post Aman has already summed up many important links.

Maria Colgan has posted some useful links here.

And here is a link to a slidedeck about Upgrading and Migrating to 12c.

Happy 12c’ing !

Categories: BI & Warehousing

agent deployment error in EM 12c

Sun, 2013-06-16 12:04

Yesterday I was configuring EM 12c for a Sun Super Cluster system. There were a total of 4 LDOMs where I needed to deploy the agent (Setup –> Add targets –> Add targets manually). Out of these 4 everything went fine for 2 LDOMs but for the other two it failed with an error message. It didn’t give much details on the EM screen but rather gave a message to try to secure/start the agent manually. When I tried to do that manually the secure agent part worked fine but the start agent command failed with the following error message:

oracle@app1:~$emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
Starting agent ………………………………………………………. failed.
HTTP Listener failed at Startup
Possible port conflict on port(3872): Retrying the operation…
Failed to start the agent after 1 attempts.  Please check that the port(3872) is available.

I thought that there was something wrong with the port thing so I cleaned the agent installation, made sure that the port wasn’t being used and did the agent deployment again. This time it again failed with the same message but it reported a different port number ie 1830 agent port no:

oracle@app1:~$emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 2
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
Starting agent ……………………………………………. failed.
HTTP Listener failed at Startup
Possible port conflict on port(1830): Retrying the operation…
Failed to start the agent after 1 attempts.  Please check that the port(1830) is available.

Again checked few things but found nothing wrong. All the LDOMs had similar configuration so what worked for the other two should have worked for these two also.

Before starting with the installation I had noted the LDOM hostnames and IPs in a notepad file and had swapped the IPs of two LDOMs (actually these two only Smile with tongue out ). But later on I found that and corrected. While looking at the notepad file it occurred to me that the same stuff could be wrong in /etc/hosts of the server where EM is deployed. Oh boy that is what it was. While making the entries in /etc/hosts of EM server, I copied it from the notepad and the wrong entries got copied. The IPs for these two LDOMs got swapped with each other and that was causing the whole problem.

deinstalled the agent, correct the /etc/hosts and tried to deploy again…all worked well !

Categories: BI & Warehousing