Skip navigation.

DBA Blogs

Partner Webcast – Oracle Big Data & Business Analytics: NEOS SNA Solution for Telcos

Behind the hype of big data there's a simple story, as for decades companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data,...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Pillars of PowerShell: Windows OS

Pythian Group - Wed, 2015-06-10 06:42
Introduction

This is the fifth blog post continuing the series on the Pillars of PowerShell. The previous post in the series are:

  1. Interacting
  2. Commanding
  3. Debugging
  4. Profiling

The Windows Operating System (OS) is something a DBA should know and be familiar with since SQL Server has to run on top of it. I would say that on average most DBAs interact with the OS for troubleshooting purposes. In this post I just want to point out a few snippets of how PowerShell can help you do this type of work.

 Services Console Manager

In the SQL Server 2000 days DBAs became very familiar with typing in “services.msc” in the run prompt. Scrolling through the list of services to find out what state it is, or what the login is configured for with a particular service. Now, if you are performing administrative tasks against SQL Server services it is always advised that you use SQL Server Configuration Manager. However, if you are looking to check the status of the service or performing a restart of just the service, PowerShell can help out.

Get-Service

This cmdlet has a few discrepancies that it can help to understand upfront when you start using PowerShell instead of the Services Console. In the Services Console you find the service by the “Name”, this is the “DisplayName in the Get-Service cmdlet. The “Name” in Get-Service is actually the “Service Name” in the Service Console, do you follow? OK. So with SQL Server the DisplayName for a default instance would be “SQL Server (MSSQLSERVER)”, and the “Name” would be “mssqlserver”. This cmdlet allows you to filter by either field so the below two commands will return the same thing:

Get-Service 'SQL Server (MSSQLSERVER)'
Get-Service mssqlserver

You can obviously see which one is easier to type right off. So with SQL Server you will likely know that a default instance’s name would be queried using “mssqlserver”, and a named instance would be “mssql$myinstance”. So if you wanted to find all of the instances running on a server you could use this one-liner:

Get-Service mssql*
Restart-Service

This does exactly what you think it will, so you have to be careful. You can call this cmdlet by itself and restart a service by referencing the “name” just as you did with Get-Service. I want to show you how the pipeline can work for you in this situation. You will find some cmdlets in PowerShell that have a few “special” features. The service cmdlets are included in this category, they allow an array as an input object to the cmdlet for the property or via the pipeline.

So, let’s use the example that I have a server with multiple instances of SQL Server, and all the additional components like SSRS and SSIS. I only want to work with the named instance “SQL12″. I can get the status of all component services with this command:

Get-Service -Name 'MSSQL$SQL12','ReportServer$SQL12','SQLAgent$SQL12','MsDtsServer110'

Now if I need to do a controlled restart of all of those services I can just do this command:

Get-Service -Name 'MSSQL$SQL12','ReportServer$SQL12','SQLAgent$SQL12','MsDtsServer110' |
Restart-Service -Force -WhatIf

The added “-WhatIf” will not actually perform the operation but tell you what it would end up doing. Once I remove that the restart would actually occur. All of this would look something like this in the console:

Get-Service_Restart-Service Win32_Service

Some of you may recognize this one as a WMI class, and it is. Using WMI offers you a bit more information than the Get-Service cmdlet. You can see that by just running this code:

Get-Service mssqlserver
Get-WmiObject win32_service | where {$_.name -eq 'mssqlserver'}

The two commands above equate to the same referenced service but return slightly different bits of information by default:

gwmi_Win32_Service

However, if you run the command below, you will see how gathering service info with WMI offers much more potential:

Get-WmiObject win32_service | where {$_.name -eq 'mssqlserver'} | select *

Get-Service will not actually give you the service account. So here is one function I use often (saved in my profile):

function Get-SQLServiceStatus ([string[]]$server)
{
 foreach ($s in $server) {
 Get-WmiObject win32_service -ComputerName $s |
	where {$_.DisplayName -match "SQL "} |
	select @{Label="ServerName";Expression={$s}},
	DisplayName, Name, State, Status, StartMode, StartName
 }
}

One specific thing I did in this function is declaring the type of parameter you pass into this function. When you use “[string[]]”, it means the parameter accepts an array or multiple objects. You can set your variable to do this, but you also have to ensure the function is written in a manner that can process the array. I did this simply by wrapping the commands into a “foreach” loop. So an example use of this against a single server would be:
getsqlserverstatus
If you wanted to run this against multiple servers it would go something like this:

Get-SQLServerStatus -server 'MyServer','MyServer2','MyServer3' | Out-GridView
#another option
$serverList = 'MyServer','MyServer2','MyServer3'
Get-SQLServerStatus -server $serverList | Out-GridView
Disk Manager

Every DBA should be very familiar with this management console and can probably get to it blind folded. You might use this or “My Computer” when you need to see how much free space there is on a drive. If you happen to be working in an environment that only has Window Server 2012 and Windows 8 or higher, wish I was there with you. PowerShell 4.0 and higher offers storage cmdlets that let you get information about your disk and volume much easier, and cleaner. They actually use CIM (Common Information Model), which is what WMI is built upon. I read somewhere that basically “WMI is just Microsoft’s way of implementing CIM”. They are obviously going back to the standard, as they have done with other areas. It is worth learning more about, and it actually allows you to connect to a PowerShell 2.0 machine to get the same amount of information.

Anyway back to the task at hand. If you are working on PowerShell 3.0 or lower you can use Get-WmiObject and win32_Volume to get similar information that the storage cmdlet Get-Volume returns in 4.0:

Get-Volume
Get-WmiObject win32_volume | select DriveLetter, Label, FileSystem,
@{Label="SizeRemaining";Expression={"{0:N2}" -f($_.FreeSpace/1GB)}},
@{Label="Size";Expression={"{0:N2}" -f($_.Capacity/1GB)}} | Format-Table
win32_volume  Windows Event Viewer

Almost everyone is familiar with and knows their way around the Windows Event Viewer. I actually left this last for a reason. I want to walk you through an example that I think will help “put it all together” on what PowerShell can do for you. Our scenario is dealing with a server that had an unexpected restart, at least for me. There are times that I will get paged by our Avail Monitoring product for a customer’s site, and I need to find out who or why the server restarted. The most common place you are going to go for this will be the Event Log.

Show-EventLog

If you just want to go through Event Viewer and manually find events, and it is a remote server, I find this to be the quickest method:

Show-EventLog -ComputerName Server1

This command will open Event Viewer and go through the process of connecting you to “Server1″. No more right-clicking and selecting “connect to another computer”!

Get-EventLog

I prefer to just dig into searching for events, this is where Get-EventLog comes in handy. You can call this cmdlet and provide:

  1. Specific Log to look in (system, application, or security most commonly)
  2. Specify a time range
  3. Look just for specific entry type (error, information, warning, etc.)

In Windows Server 2003 Microsoft added a group policy “Shutdown Event Tracker” that if enabled writes particular events to the System Log when a server restarts, either planned or unplanned. In an unplanned event the first user that logs into the server will get a prompt about the unexpected shutdown. When you are dealing with planned, they are prompted for a similar prompt for restart and it has to be filled in before the restart will occur. What you can do with this cmdlet is search for those messages in the System Log.

To find the planned you would use:

Get-EventLog -LogName System -Message "*restart*" -ComputerName Server1 |
select * -First 1

Then to find the unplanned simply change “*restart*” to “*shutdown*”:


geteventlog

In this instance I find that SSIS and SSRS did not start back up and failed to start. I found this because I checked the status of the services for SQL Server using my custom function, Get-SQLServiceStatus:

troubleshoot_service1a

To search for events after the shutdown I need to find the first event that is written to the Event Log when a server starts up, the EventLog source. I can then use that time stamp as a starting point to search for messages on the SQL Server services that did not start up correctly. I just need the time stamp of the event and pass that into the Get-EventLog cmdlet to pull up error events. I am going to do that with this bit of code:

$t = Get-EventLog -LogName System -Source EventLog -Message "*shutdown*" | select * -First 1
Get-EventLog -LogName System -Before $t.TimeGenerated -Newest 5 -EntryType Error |
select TimeGenerated, Source, Message | Format-Table -Wrap
troubleshoot_service2a Summary

I hope you found this post useful and it gets you excited about digging deeper into PowerShell. In the next post I am going to close up the series digging into SQL Server and a few areas where PowerShell can help.

 

Learn more about our expertise in SQL Server.

Categories: DBA Blogs

Bug 13914613 Example Shared Pool Latch Waits

Bobby Durrett's DBA Blog - Tue, 2015-06-09 17:18

Oracle support says we have hit bug 13914613.  Here is what our wait events looked like in an AWR report:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class latch: shared pool 3,497 17,482 4999 38.83 Concurrency latch: row cache objects 885 12,834 14502 28.51 Concurrency db file sequential read 1,517,968 8,206 5 18.23 User I/O DB CPU 4,443 9.87 library cache: mutex X 7,124 2,639 370 5.86 Concurrency

What really struck me about these latch waits were that the average wait time was several thousand milliseconds which means several seconds.  That’s a long time to wait for a latch.

Oracle pointed to the Latch Miss Sources section of the AWR.  This is all gibberish to me.  I guess it is the name of internal kernel latch names.

Latch Miss Sources Latch Name Where NoWait Misses Sleeps Waiter Sleeps shared pool kghfrunp: clatch: wait 0 1,987 1,956 shared pool kghfrunp: alloc: session dur 0 1,704 1,364

Bug description says “Excessive time holding shared pool latch in kghfrunp with auto memory management” so I guess the “kghfrunp” latch miss sources told Oracle support that this was my issue.

I did this query to look for resize operations:

SELECT COMPONENT ,OPER_TYPE,FINAL_SIZE Final,
  2  to_char(start_time,'dd-mon hh24:mi:ss') Started,
  3  to_char(end_time,'dd-mon hh24:mi:ss') Ended
  4  FROM V$SGA_RESIZE_OPS;

COMPONENT                 OPER_TYPE               FINAL STARTED                   ENDED
------------------------- ------------- --------------- ------------------------- -------------------------
DEFAULT 2K buffer cache   STATIC                      0 12-may 04:33:01           12-may 04:33:01
streams pool              STATIC            134,217,728 12-may 04:33:01           12-may 04:33:01
ASM Buffer Cache          STATIC                      0 12-may 04:33:01           12-may 04:33:01
DEFAULT buffer cache      INITIALIZING   10,401,873,920 12-may 04:33:01           12-may 04:33:08
DEFAULT 32K buffer cache  STATIC                      0 12-may 04:33:01           12-may 04:33:01
KEEP buffer cache         STATIC          2,147,483,648 12-may 04:33:01           12-may 04:33:01
shared pool               STATIC         13,958,643,712 12-may 04:33:01           12-may 04:33:01
large pool                STATIC          2,147,483,648 12-may 04:33:01           12-may 04:33:01
java pool                 STATIC          1,073,741,824 12-may 04:33:01           12-may 04:33:01
DEFAULT buffer cache      STATIC         10,401,873,920 12-may 04:33:01           12-may 04:33:01
DEFAULT 16K buffer cache  STATIC                      0 12-may 04:33:01           12-may 04:33:01
DEFAULT 8K buffer cache   STATIC                      0 12-may 04:33:01           12-may 04:33:01
DEFAULT 4K buffer cache   STATIC                      0 12-may 04:33:01           12-may 04:33:01
RECYCLE buffer cache      STATIC                      0 12-may 04:33:01           12-may 04:33:01
KEEP buffer cache         INITIALIZING    2,147,483,648 12-may 04:33:02           12-may 04:33:04
DEFAULT buffer cache      SHRINK         10,334,765,056 20-may 21:00:12           20-may 21:00:12
shared pool               GROW           14,025,752,576 20-may 21:00:12           20-may 21:00:12
shared pool               GROW           14,092,861,440 27-may 18:06:12           27-may 18:06:12
DEFAULT buffer cache      SHRINK         10,267,656,192 27-may 18:06:12           27-may 18:06:12
shared pool               GROW           14,159,970,304 01-jun 09:07:35           01-jun 09:07:36
DEFAULT buffer cache      SHRINK         10,200,547,328 01-jun 09:07:35           01-jun 09:07:36
DEFAULT buffer cache      SHRINK         10,133,438,464 05-jun 03:00:33           05-jun 03:00:33
shared pool               GROW           14,227,079,168 05-jun 03:00:33           05-jun 03:00:33
DEFAULT buffer cache      SHRINK         10,066,329,600 08-jun 11:06:06           08-jun 11:06:07
shared pool               GROW           14,294,188,032 08-jun 11:06:06           08-jun 11:06:07

The interesting thing is that our problem ended right about the time the last shared pool expansion supposedly started.  The latch waits hosed up our database for several minutes and it ended right about 11:06.  I suspect that the system was hung up with the bug and then once the bug finished then the normal expansion work started.  Or, at least, the time didn’t get recorded until after the bug finished slowing us down.

So, I guess it’s just a bug.  This is on 11.2.0.3 on HP-UX Itanium.  I believe there is a patch set with the fix for this bug.

Maybe it will be helpful for someone to see an example.

– Bobby

Categories: DBA Blogs

Links for 2015-06-08 [del.icio.us]

Categories: DBA Blogs

How to Display Base64 Encoded Image in MAF

One of the feature and benefits of Oracle Mobile Framework is the ability to access native device services, such as your smartphone camera. In an Oracle Mobile Application Framework...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Partner Webcast – Oracle Database 12c: Application Express 5.0 for Cloud development

If you have the Oracle Database, you already have Application Express. When you get Oracle Database Cloud, you get Application Express full development platform for cloud-based applications. Since...

We share our skills to maximize your revenue!
Categories: DBA Blogs

RMAN -- 1 : Backup Job Details

Hemant K Chitale - Sun, 2015-06-07 03:57
Here's a post on how you could be misled by a simple report on the V$RMAN_BACKUP_JOB_DETAILS view.

Suppose I run RMAN Backups through a shell script.  Like this :

[oracle@localhost Hemant]$ ls -l *sh
-rwxrw-r-- 1 oracle oracle 336 Jun 7 17:30 Backup_DB_Plus_ArchLogs.sh
[oracle@localhost Hemant]$ cat Backup_DB_Plus_ArchLogs.sh
ORACLE_SID=orcl;export ORACLE_SID

rman << EOF
connect target /

spool log to Backup_DB_plus_ArchLogs.LOG

backup as compressed backupset database ;

sql 'alter system switch logfile';
sql 'alter system archive log current' ;

backup as compressed backupset archivelog all;

backup as compressed backupset current controlfile ;

EOF

[oracle@localhost Hemant]$
[oracle@localhost Hemant]$
[oracle@localhost Hemant]$
[oracle@localhost Hemant]$ ./Backup_DB_Plus_ArchLogs.sh

Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jun 7 17:31:06 2015

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

RMAN>
connected to target database: ORCL (DBID=1229390655)

RMAN>
RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> RMAN> [oracle@localhost Hemant]$
[oracle@localhost Hemant]$

I then proceed to check the results of the run in V$RMAN_BACKUP_JOB_DETAILS.

SQL> l
1 select to_char(start_time,'DD-MON HH24:MI') StartTime, to_char(end_time,'DD-MON HH24:MI') EndTime,
2 input_type, status
3 from v$rman_backup_job_details
4* where start_time > trunc(sysdate)+17.5/24
SQL> /

STARTTIME ENDTIME INPUT_TYPE STATUS
--------------------- --------------------- ------------- -----------------------
07-JUN 17:31 07-JUN 17:31 DB FULL FAILED

SQL>

It says that I ran one FULL DATABASE Backup that failed. Is that really true ?  Let me check the RMAN spooled log.

[oracle@localhost Hemant]$ cat Backup_DB_plus_ArchLogs.LOG

Spooling started in log file: Backup_DB_plus_ArchLogs.LOG

Recovery Manager11.2.0.2.0

RMAN>
RMAN>
Starting backup at 07-JUN-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=60 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=59 device type=DISK
RMAN-06169: could not read file header for datafile 6 error reason 4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 06/07/2015 17:31:08
RMAN-06056: could not access datafile 6

RMAN>
RMAN>
sql statement: alter system switch logfile

RMAN>
sql statement: alter system archive log current

RMAN>
RMAN>
Starting backup at 07-JUN-15
current log archived
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=615 RECID=1 STAMP=881773851
channel ORA_DISK_1: starting piece 1 at 07-JUN-15
channel ORA_DISK_2: starting compressed archived log backup set
channel ORA_DISK_2: specifying archived log(s) in backup set
input archived log thread=1 sequence=616 RECID=2 STAMP=881773851
input archived log thread=1 sequence=617 RECID=3 STAMP=881773853
input archived log thread=1 sequence=618 RECID=4 STAMP=881774357
input archived log thread=1 sequence=619 RECID=5 STAMP=881774357
channel ORA_DISK_2: starting piece 1 at 07-JUN-15
channel ORA_DISK_2: finished piece 1 at 07-JUN-15
piece handle=/NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_annnn_TAG20150607T173112_bq83v12b_.bkp tag=TAG20150607T173112 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:01
channel ORA_DISK_2: starting compressed archived log backup set
channel ORA_DISK_2: specifying archived log(s) in backup set
input archived log thread=1 sequence=620 RECID=6 STAMP=881775068
input archived log thread=1 sequence=621 RECID=7 STAMP=881775068
input archived log thread=1 sequence=622 RECID=8 STAMP=881775071
channel ORA_DISK_2: starting piece 1 at 07-JUN-15
channel ORA_DISK_1: finished piece 1 at 07-JUN-15
piece handle=/NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_annnn_TAG20150607T173112_bq83v10y_.bkp tag=TAG20150607T173112 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_2: finished piece 1 at 07-JUN-15
piece handle=/NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_annnn_TAG20150607T173112_bq83v292_.bkp tag=TAG20150607T173112 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:01
Finished backup at 07-JUN-15

Starting Control File and SPFILE Autobackup at 07-JUN-15
piece handle=/NEW_FS/oracle/FRA/ORCL/autobackup/2015_06_07/o1_mf_s_881775075_bq83v3nr_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 07-JUN-15

RMAN>
RMAN>
Starting backup at 07-JUN-15
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 07-JUN-15
channel ORA_DISK_1: finished piece 1 at 07-JUN-15
piece handle=/NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_ncnnf_TAG20150607T173117_bq83v6vg_.bkp tag=TAG20150607T173117 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 07-JUN-15

Starting Control File and SPFILE Autobackup at 07-JUN-15
piece handle=/NEW_FS/oracle/FRA/ORCL/autobackup/2015_06_07/o1_mf_s_881775080_bq83v88z_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 07-JUN-15

RMAN>
RMAN>

Recovery Manager complete.
[oracle@localhost Hemant]$

Hmm. There were *three* distinct BACKUP commands in the script file.  The first was BACKUP ... DATABASE ..., the second was BACKUP ... ARCHIVELOG ... and the third was BACKUP ... CURRENT CONTROLFILE.  All three were executed.
Only the first BACKUP execution failed.  The subsequent  two BACKUP commands succeeded.  They were for ArchiveLogs and the Controlfile.
And *yet* the view V$RMAN_BACKUP_JOB_DETAILS shows that I ran  a FULL DATABASE BACKUP that failed.  It tells me nothing about the ArchiveLogs and the ControlFile backups that did succeed !


What if I switch my strategy from using a shell script to an rman script ?

[oracle@localhost Hemant]$ ls -ltr *rmn
-rw-rw-r-- 1 oracle oracle 287 Jun 7 17:41 Backup_DB_plus_ArchLogs.rmn
[oracle@localhost Hemant]$ cat Backup_DB_plus_ArchLogs.rmn
connect target /

spool log to Backup_DB_plus_ArchLogs.TXT

backup as compressed backupset database ;

sql 'alter system switch logfile';
sql 'alter system archive log current' ;

backup as compressed backupset archivelog all;

backup as compressed backupset current controlfile;

exit

[oracle@localhost Hemant]$
[oracle@localhost Hemant]$
[oracle@localhost Hemant]$
[oracle@localhost Hemant]$ rman @Backup_DB_plus_ArchLogs.rmn

Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jun 7 17:42:17 2015

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

RMAN> connect target *
2>
3> spool log to Backup_DB_plus_ArchLogs.TXT
4>
5> backup as compressed backupset database ;
6>
7> sql 'alter system switch logfile';
8> sql 'alter system archive log current' ;
9>
10> backup as compressed backupset archivelog all;
11>
12> backup as compressed backupset current controlfile;
13>
14> exit[oracle@localhost Hemant]$




SQL> l
1 select to_char(start_time,'DD-MON HH24:MI') StartTime, to_char(end_time,'DD-MON HH24:MI') EndTime,
2 input_type, status
3 from v$rman_backup_job_details
4 where start_time > trunc(sysdate)+17.5/24
5* order by start_time
SQL> /

STARTTIME ENDTIME INPUT_TYPE STATUS
--------------------- --------------------- ------------- -----------------------
07-JUN 17:31 07-JUN 17:31 DB FULL FAILED
07-JUN 17:42 07-JUN 17:42 DB FULL FAILED

SQL>

[oracle@localhost Hemant]$
[oracle@localhost Hemant]$ cat Backup_DB_plus_ArchLogs.TXT

connected to target database: ORCL (DBID=1229390655)

Spooling started in log file: Backup_DB_plus_ArchLogs.TXT

Recovery Manager11.2.0.2.0

Starting backup at 07-JUN-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=59 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=50 device type=DISK
RMAN-06169: could not read file header for datafile 6 error reason 4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 06/07/2015 17:42:19
RMAN-06056: could not access datafile 6

Recovery Manager complete.
[oracle@localhost Hemant]$

Now, this time, once the first BACKUP command failed, RMAN seems to have bailed out. It didn't even try executing the subsequent BACKUP commands !

How can V$RMAN_BACKUP_JOB_DETAILS differentiate from the two failed backups ?

SQL> l
1 select to_char(start_time,'DD-MON HH24:MI') StartTime, to_char(end_time,'DD-MON HH24:MI') EndTime,
2 input_bytes/1048576 Input_MB, output_bytes/1048576 Output_MB,
3 input_type, status
4 from v$rman_backup_job_details
5 where start_time > trunc(sysdate)+17.5/24
6* order by start_time
SQL> /

STARTTIME ENDTIME INPUT_MB OUTPUT_MB INPUT_TYPE STATUS
--------------------- --------------------- ---------- ---------- ------------- -----------------------
07-JUN 17:31 07-JUN 17:31 71.5219727 34.878418 DB FULL FAILED
07-JUN 17:42 07-JUN 17:42 0 0 DB FULL FAILED

SQL>

The Input Bytes does indicate that some files were backed up in the first run. Yet, it doesn't tell us how much of those were ArchiveLogs and how much were the ControlFile.


Question 1 : How would you script your backups ?  (Hint : Differentiate between the BACKUP DATABASE and the BACKUP ARCHIVELOG runs).

Question 2 : Can you improve your Backup Reports ?

Yes, the RMAN LIST BACKUP command is useful.  But you can't select the columns, format the output or add text  as you would with a query on V$ views.

[oracle@localhost oracle]$ NLS_DATE_FORMAT=DD_MON_HH24_MI_SS;export NLS_DATE_FORMAT
[oracle@localhost oracle]$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jun 7 17:51:41 2015

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL (DBID=1229390655)

RMAN> list backup completed after "trunc(sysdate)+17.5/24";

using target database control file instead of recovery catalog

List of Backup Sets
===================


BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
17 375.50K DISK 00:00:01 07_JUN_17_31_13
BP Key: 17 Status: AVAILABLE Compressed: YES Tag: TAG20150607T173112
Piece Name: /NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_annnn_TAG20150607T173112_bq83v12b_.bkp

List of Archived Logs in backup set 17
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------------- ---------- ---------
1 616 14068910 07_JUN_17_10_49 14068920 07_JUN_17_10_51
1 617 14068920 07_JUN_17_10_51 14068931 07_JUN_17_10_53
1 618 14068931 07_JUN_17_10_53 14069550 07_JUN_17_19_17
1 619 14069550 07_JUN_17_19_17 14069564 07_JUN_17_19_17

BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
18 1.03M DISK 00:00:00 07_JUN_17_31_14
BP Key: 18 Status: AVAILABLE Compressed: YES Tag: TAG20150607T173112
Piece Name: /NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_annnn_TAG20150607T173112_bq83v292_.bkp

List of Archived Logs in backup set 18
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------------- ---------- ---------
1 620 14069564 07_JUN_17_19_17 14070254 07_JUN_17_31_08
1 621 14070254 07_JUN_17_31_08 14070265 07_JUN_17_31_08
1 622 14070265 07_JUN_17_31_08 14070276 07_JUN_17_31_11

BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
19 13.72M DISK 00:00:02 07_JUN_17_31_14
BP Key: 19 Status: AVAILABLE Compressed: YES Tag: TAG20150607T173112
Piece Name: /NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_annnn_TAG20150607T173112_bq83v10y_.bkp

List of Archived Logs in backup set 19
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------------- ---------- ---------
1 615 14043833 12_JUN_23_28_21 14068910 07_JUN_17_10_49

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
20 Full 9.36M DISK 00:00:00 07_JUN_17_31_15
BP Key: 20 Status: AVAILABLE Compressed: NO Tag: TAG20150607T173115
Piece Name: /NEW_FS/oracle/FRA/ORCL/autobackup/2015_06_07/o1_mf_s_881775075_bq83v3nr_.bkp
SPFILE Included: Modification time: 07_JUN_17_28_15
SPFILE db_unique_name: ORCL
Control File Included: Ckp SCN: 14070285 Ckp time: 07_JUN_17_31_15

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
21 Full 1.05M DISK 00:00:02 07_JUN_17_31_19
BP Key: 21 Status: AVAILABLE Compressed: YES Tag: TAG20150607T173117
Piece Name: /NEW_FS/oracle/FRA/ORCL/backupset/2015_06_07/o1_mf_ncnnf_TAG20150607T173117_bq83v6vg_.bkp
Control File Included: Ckp SCN: 14070306 Ckp time: 07_JUN_17_31_17

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
22 Full 9.36M DISK 00:00:00 07_JUN_17_31_20
BP Key: 22 Status: AVAILABLE Compressed: NO Tag: TAG20150607T173120
Piece Name: /NEW_FS/oracle/FRA/ORCL/autobackup/2015_06_07/o1_mf_s_881775080_bq83v88z_.bkp
SPFILE Included: Modification time: 07_JUN_17_31_18
SPFILE db_unique_name: ORCL
Control File Included: Ckp SCN: 14070312 Ckp time: 07_JUN_17_31_20

RMAN>

So, the RMAN LIST BACKUP can provide details that V$RMAN_BACKUP_JOB_DETAILS cannot provide. Yet, it doesn't tell us that a Backup failed.
.
.
.

Categories: DBA Blogs

Install Oracle RightNow Cloud Adapter in JDeveloper

Today, there are thousands of enterprise customers across the globe using Oracle RightNow CX cloud service for providing superior customer experience across multiple channels including web, contact...

We share our skills to maximize your revenue!
Categories: DBA Blogs

INDEX FULL SCAN (MIN/MAX) Not Used – How To Resolve

Oracle in Action - Sat, 2015-06-06 01:06

RSS content

If you want to find out  the minimum or the maximum of a column value and the column is indexed, Oracle can very quickly determine the minimum or maximum value of the column by navigating to the first (left-most) or last (right-most) leaf blocks in the index structure to get the  Min or Max values respectively.  This access path known as  Index Full Scan (Min/Max) is extremely cost effective as instead of scanning the entire index / table, only first or last entries in the index need to be read.

In case the Select clause includes another column with a function applied to it, optimizer employs Full table Scan instead. In this post, I will demonstrate this scenario and also the solution to the same.

In my test setup, I have a table HR.EMP having index on SALARY column.

— Let’s first query the MIN(SALARY) and SYSDATE from HR.EMP. It can be seen that optimizer employs INDEX FULL SCAN (MIN/MAX) as desired.

SQL>select min(salary),  sysdate from hr.emp ;
select * from table (dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 7c3q3s8g2ucxx, child number 0
-------------------------------------
select min(salary), sysdate from hr.emp

Plan hash value: 3077585419
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| EMP_SAL | 107 | 428 | |
----------------------------------------------------------------------

— Now if I try to find out MIN(SALARY)  with function applied to SYSDATE, the optimizer chooses  costly TABLE ACCESS FULL instead of  INDEX FULL SCAN (MIN/MAX) .

SQL>select min(salary), to_char(sysdate, 'dd/mm/yy') from hr.emp ;
select * from table (dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 3dthda93cgm6v, child number 0
-------------------------------------
select min(salary), to_char(sysdate, 'dd/mm/yy') from hr.emp

Plan hash value: 2083865914
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| EMP | 107 | 428 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------

As a workaround , we can  restructure our query as shown so that  it uses an inline view to get the MIN(SALARY)   so that optimizer chooses   INDEX FULL SCAN (MIN/MAX) and  function to SYSDATE  is applied in the  main SELECT clause.

SQL>select min_salary, to_char(sysdt, 'dd/mm/yy') from
(select min(salary) min_salary, sysdate sysdt from hr.emp) ;
select * from table (dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 5rzz6x8wzkh2k, child number 0
-------------------------------------
select min_salary, to_char(sysdt, 'dd/mm/yy') from (select
min(salary) min_salary, sysdate sysdt from hr.emp)

Plan hash value: 2631972856
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | VIEW | | 1 | 19 | 3 (0)| 00:00:01 |
| 2 | SORT AGGREGATE | | 1 | 4 | | |
| 3 | INDEX FULL SCAN (MIN/MAX)| EMP_SAL | 107 | 428 | | |
-------------------------------------------------------------------------

Hope it helps!

References:

AIOUG -North India Chapter- Performance Tuning By Vijay Sehgal – 30th May 2015
Index Full Scan (MIN/MAX) and Partitioned Table

——————————————————————————————————————-

Related links:

Home
Tuning Index



Tags:  

Del.icio.us
Digg

Comments:  0 (Zero), Be the first to leave a reply!
You might be interested in this:  
Copyright © ORACLE IN ACTION [INDEX FULL SCAN (MIN/MAX) Not Used - How To Resolve], All Right Reserved. 2015.

The post INDEX FULL SCAN (MIN/MAX) Not Used – How To Resolve appeared first on ORACLE IN ACTION.

Categories: DBA Blogs

Using startLocationMonitor to Enable GeoLocation in MAF Applications

On of the feature and benefits of Oracle Mobile Framework is the ability to access native device services, such as the smartphone GPS. Of course, when you are developing a mobile application,...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Log Buffer #426: A Carnival of the Vanities for DBAs

Pythian Group - Fri, 2015-06-05 08:57

This Log Buffer edition transcends beyond ordinary and loop in few of the very good blog posts from Oracle, SQL Server and MySQL.


Oracle:

  • Variable selection also known as feature or attribute selection is an important technique for data mining and predictive analytics.
  • The Oracle Utilities SDK V4.3.0.0.2 has been released and is available from My Oracle Support for download.
  • This article provides a high level list of the new features that exist in HFM 11.1.2.4 and details the changes/differences between HFM 11.1.2.4 and previous releases.
  • In recent years, we’ve seen increasing interest from small-to-mid-sized carriers in transforming their policy administration systems (PAS).
  • Got a question on how easy it is to use ORDS to perform insert | update | delete on a table?

SQL Server:

  • The Importance of Database Indexing
  • Stairway to SQL Server Security Level 9: Transparent Data Encryption
  • Query Folding in Power Query to Improve Performance
  • Writing Better T-SQL: Top-Down Design May Not be the Best Choice
  • Cybercrime – the Dark Edges of the Internet

MySQL:

  • One of the challenges in storage engine design is random I/O during a write operation.
  • Fast Galera Cluster Deployments in the Cloud Using Juju
  • High availability using MySQL in the cloud
  • Become a DBA blog series – Monitoring and Trending
  • MySQL as an Oracle DBA

Learn more about Pythian’s expertise in Oracle , SQL ServerMySQL, as well as the author Fahd Mirza.

Categories: DBA Blogs

Still in recovery mode from GLOC 2015

Grumpy old DBA - Thu, 2015-06-04 17:42
The conference this year was ( as usual ) just absolutely the best ever.  It was amazing and content was broad and deep and the speakers just absolutely world class.  Thanks to everyone who was involved but especially to all our speakers.

I was absolutely wiped out the week before and after the conference.  I think I am getting too old for this kind of thing but the only reason people do this is contributing to our members and to the community.

I wish I was blogging more technical stuff.  I have some very interesting things going on but confidentiality makes it difficult to approach posting items.

Stay tuned?

Thanks John
Categories: DBA Blogs

Permissions in Redshift Administration

Pythian Group - Thu, 2015-06-04 12:01

As with any database administration strategy, management of Redshift requires setting and revoking permissions.  When first setting up permissions, it is ideal to establish groups as the basic unit of user access.  This keeps us from having to manage hundreds of permissions as users enter and leave organizations.  If you haven’t done this early on and are managing permissions on the user level, you can leverage some of the queries below to make the migration to group based access easy.

Another advantage we see in managing by groups is for some data warehouses we want to exclude users from running reports during ETL runs in order to prevent contention or reporting on incomplete data.  All we have to do is run this query at the start of the jobs:

revoke usage on schema myreportschema from group report_group;

When the jobs are finished, we then grant usage again:

grant usage on schema myreportschema to group report_group;

It is easy to see users and group assignments via (note that a user can belong to more than one group):

select usesysid, usename, usesuper, nvl(groname,'default')
from pg_user u
left join pg_group g on ','||array_to_string(grolist,',')||',' like '%,'||cast(usesysid as varchar(10))||',%'
order by 3,2;

Grants in Redshift are ultimately at the object level.  That is to say while we can grant access to all existing objects within a schemas, those grants are stored at an object level.  That is why issuing this command works for all existing tables, but tables added after this command that have been run do not automatically have the select privilege granted:

grant select on all tables in schema myreportschema to group report_group;

While this is good for security granularity, it can be administratively painful.  Let us examine some strategies and tools for addressing this issue.

Redshift has the useful view, information_schema.table_privileges, that lists tables and their permissions for both users and groups.  Note that this also includes views despite the name.   AWS also provides some useful views in their Redshift utilities package in Github, most notably v_get_obj_priv_by_user which essentially flattens out information_schema.table_privileges and makes it much easier to read:

schemaname objectname usename sel ins upd del ref myreportschema myreporttable biuser TRUE FALSE FALSE FALSE FALSE

However, note that this view does not report on groups and the HAS_TABLE_PRIVILEGE function that the view uses has no equivalent for groups, so to examine group privileges we can parse the ACL:

select
namespace as schemaname , item as object, pu.groname as groupname
, decode(charindex('r',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)),0,0,1)  as select
, decode(charindex('w',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)),0,0,1)  as update
, decode(charindex('a',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)),0,0,1)  as insert
, decode(charindex('d',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)),0,0,1)  as delete
from
(select
use.usename as subject,
nsp.nspname as namespace,
c.relname as item,
c.relkind as type,
use2.usename as owner,
c.relacl
from
pg_user use
cross join pg_class c
left join pg_namespace nsp on (c.relnamespace = nsp.oid)
left join pg_user use2 on (c.relowner = use2.usesysid)
where c.relowner = use.usesysid
and  nsp.nspname not in ('pg_catalog', 'pg_toast', 'information_schema')
)
join pg_group pu on array_to_string(relacl, '|') like '%'||pu.groname||'%'

One of the biggest challenges is to fill in missing grants.  We can do this by modifying the above query.  Here’s an example where we create the grant statements for all missing select grants for the report_group:

select 'grant select on '||namespace||'.'||item||' to group report_group;'
from
(select
use.usename as subject,
nsp.nspname as namespace,
c.relname as item,
c.relkind as type,
use2.usename as owner,
c.relacl
from
pg_user use
cross join pg_class c
left join pg_namespace nsp on (c.relnamespace = nsp.oid)
left join pg_user use2 on (c.relowner = use2.usesysid)
where
c.relowner = use.usesysid
and  nsp.nspname ='myreportschema'
and ((array_to_string(relacl, '|') not like '%report_group%' or relacl is null)
or
-- note the 'r' for the select privilege
(array_to_string(relacl, '|') like '%report_group%' and charindex('r', split_part( split_part( array_to_string( relacl, '|' ), 'admin_group', 2 ) , '/', 1 ) ) = 0 )
)
and c.relkind   <> 'i'  -- is not an index
);

 

Learn more about Pythian’s expertise in Database Managed Services.

Categories: DBA Blogs

Tracking ASM Metrics

Pythian Group - Wed, 2015-06-03 14:36
Collecting ASM IO Metrics

If you are an Oracle DBA, then it is quite likely that Oracle ASM is used as the storage management for at least some of the databases you manage.

Eventually you will want to see ASM metrics that can be used to track the performance of the storage over time.  There are built-in data dictionary views that allow monitoring ASM IO performance at the database, instance, diskgroup and disk level.  These are current time metric only however, so they must be collected and saved to be of much use.

Point your browser at your favorite search engine and search for the terms Oracle, ASM and metrics.  Quite likely near the top of that list relevant hits will be bdt’s oracle blog. Bertrand Drouvot has created asm_metrics.pl, an excellent utility for monitoring and reporting on ASM IO metrics.  I have used this utility several times now with good results.

As good as asm_metrics.pl is, however, I have on several occasions wanted something slightly different.  The asm_metrics.pl script output report format, while my preference is to save data in a CSV file.  By saving all available metrics in this manner it is not necessary to decide ahead of time how the data is to be used and then chose the appropriate command line options.

When all of the metrics are preserved as data there is then no limitation on the types of post-data-collection analysis that can be performed. If for instance, you would like to break down all of the data by disk for a particular disk group, the data will be at your disposal to do so.

And so, the following collection of scripts was created.  First, I will provide a brief description of each script, followed by detailed usage.

asm-metrics-collector.pl

This is the main Perl script used for collecting ASM metrics data.

asm-metrics-aggregator.pl

This Perl script can be used as a filter to aggregate previously collected metrics.

asm-metrics-chart.pl

Output of either of the previous scripts can be used as input to this script. asm-metrics-chart.pl will create a XLSX format Excel file with line charts for the columns you select.

These scripts try not to make too many decisions for you. Their job is simply to get the data, perform some aggregations as needed, and chart the data.

While there are many options that could be added to the scripts, I have attempted to keep from making them unnecessarily complicated.  Standard Linux command line utilities can be used for many operations that would otherwise require a fair bit of code complexity.

Don’t have Linux? In this day of free hypervisors, anyone can install and use Linux.

Using the ASM Scripts

The following are more detailed descriptions of each script and their usage.

For each script the –help option will provide some explanation of the options, while the –man option may be used to show extended help.

The Perl Environment

All recent versions of the Oracle RDBMS software ship with Perl installed.  The Oracle based Perl installations already include the DBI and DBD::Oracle modules which are necessary for connecting to the database.

If you don’t already have a version of Perl with the DBI and DBD::Oracle modules installed, simply use the one installed with Oracle:

 $ORACLE_HOME/perl/bin/perl asm-metrics-collector.pl ...

The asm-metrics-chart.pl script requires the module Excel::Writer::XLSX.

If there is a local copy of Perl that you can add modules to, simply install the Excel::Writer::XLSX module.

Otherwise the module can be installed into your home directory.  Setting the PERL5LIB environment variable will allow the script to find the local copy of the module.

asm-metrics-collector.pl

This is the main Perl script used for collecting ASM metrics data.

Connecting to the database

Connections can be made either via bequeath or TNS as SYSDBA or any user with appropriate privileges.

The following example makes a bequeath connection to the local instance specified in $ORACLE_SID.  As no username or database is specified in the options this connection is the same as ‘/ as sysdba’.

asm-metrics-collector.pl -interval 10 --iterations 20 --delimiter ,

This is the most basic usage of this script.   The –interval parameter refers to the number of seconds between snapshots.

With this basic usage, not all columns are captured.  The –help and –man options display the optional columns.

This next example shows how to include optional columns in the output:

 asm-metrics-collector.pl -interval 5 -iterations 5 -delimiter , --opt-cols DISK_NAME COLD_BYTES_WRITTEN PATH &gt; asm_metrics.csv

And here is an example that causes all available columns to be output:

  asm-metrics-collector.pl -interval 5 -iterations 5 -delimiter , --opt-cols ALL-COLUMNS &gt; asm_metrics_allcols.csv

All output is to STDOUT and so must be redirected as needed.

You may have noticed there is no password on the command line and indeed, there is no provision for a password on the command line.  In the cause of security, the password must either be entered from the keyboard or sent to the script via STDIN.  Following are some examples of connecting with a user that requires a password.

This first example will require you to type in the password as the script appears to hang:

asm-metrics-collector.pl --database orcl --username scott --sysdba &gt; my-asm.csv 

The following two examples get the password from a file:

 asm-metrics-collector.pl --database orcl --username scott --sysdba &lt; password.txt &gt; my-asm.csv
cat password.txt | asm-metrics-collector.pl --database orcl --username scott --sysdba &gt; my-asm.csv 

And just for fun, this method works with the Bash Shell:

asm-metrics-collector.pl --database orcl --username scott --sysdba &lt;&lt;&lt; scott &gt; my-asm.csv
asm-metrics-aggregator.pl

By default the output of asm-metrics-collector.pl will include a row for each disk in each diskgroup.  Should you wish to observe and chart the read and write times at diskgroup level, the presence of one row per disk causes that operation to be somewhat difficult.  The following brief shell script will read output created by asm-metrics-collector.pl, aggregate the chosen columns at the requested level and write it to STDOUT in CSV format.

The aggregation level is determined by the –grouping-cols option, while the columns to aggregate are specified with –agg-cols option.


INPUT_DATAFILE='logs/asm-oravm-data-20150523-172525.csv'
OUTPUT_DATAFILE='logs/asm-oravm-aggtest.csv'

./asm-metrics-aggregator.pl  \
        --grouping-cols DISKGROUP_NAME \
        --agg-cols READS WRITES READ_TIME WRITE_TIME \
        --display-cols  DISPLAYTIME ELAPSEDTIME DBNAME DISKGROUP_NAME READS \
                WRITES READ_TIME AVG_READ_TIME WRITE_TIME \
                AVG_WRITE_TIME BYTES_READ BYTES_WRITTEN  \
        -- ${INPUT_DATAFILE}   \
        &gt; ${OUTPUT_DATAFILE}

You may be wondering about the purpose of the double dashes “–” that appear in the command line.

This is how the Perl option processor Getopt::Long knows that there are no more options available on the command line.  As the –display-cols option can take several arguments, some method must be used indicating the end of the arguments where there is following text that is is not to be processed as part of the option.  The “–” is the option list terminator, and will be well known to long time Unix and Linux users.

This script also does one set of calculations behind the scenes; the average read and write times are calculated at the current aggregation level.

What if you don’t know which columns are available in the input file? Simply use the –list-available-cols option with an input file and the script will output the available columns and exit.


&gt; ./asm-metrics-aggregator.pl --list-available-cols logs/asm-oravm-data-20150523-172525.csv
DISPLAYTIME
SNAPTIME
ELAPSEDTIME
INSTNAME
DBNAME
GROUP_NUMBER
DISK_NUMBER
DISKGROUP_NAME
READS
WRITES
READ_TIME
AVG_READ_TIME
WRITE_TIME
AVG_WRITE_TIME
BYTES_READ
BYTES_WRITTEN
DISK_NAME
READ_ERRS

As you have probably noticed, input to this script is from STDIN.

asm-metrics-chart.pl

Now it all gets more interesting as we can visualize the data collected.  There are many methods available to accomplish this.  I chose Excel as it is ubiquitous and easy to work with.  The previously discussed Perl module Excel::Writer::XLSX  makes it relatively easy to create Excel files complete with charts, directly from CSV data.

The following command line will create the default Excel file asm-metrics.xlsx with line chart for reads and writes. The –worksheet-col options specifies that each diskgroup will be shown on a separate worksheet.

asm-metrics-chart.pl asm-metrics-chart.pl --worksheet-col DISKGROUP_NAME \
--chart-cols READS WRITES -- logs/asm-oravm-20150512_01-agg-dg.csv

This next example provides a non-default name for the Excel file:

./asm-metrics-chart.pl \
        --worksheet-col DISKGROUP_NAME \
        --spreadsheet-file oravm-asm-metrics.xlsx \
        --chart-cols READS AVG_READ_TIME WRITES AVG_WRITE_TIME  \
        -- logs/asm-oravm-aggtest.csv
Modifying the Data with Command Line Tools

Suppose you have two disk groups, DATA and FRA.  The DATA disk group has five disks and FRA has two disks. In addition there are two RAC databases with two instances each. For each iteration the data collected by asm-metrics-collector.pl will have 5 rows per DATA, 2 rows per FRA, multiplied by the number of instances, so 40 rows per iteration.

Should you wish to see only the DATA rows for a single instance, and then aggregate these, it can be done via a combination of command line tools and these scripts.

First let’s get the list of columns and number them so we know how to find the instance:


&gt; ./asm-metrics-aggregator.pl --list-available-cols logs/asm-oravm-20150512_01.csv|nl
     1  DISPLAYTIME
     2  SNAPTIME
     3  ELAPSEDTIME
     4  INSTNAME
     5  DBNAME
     6  GROUP_NUMBER
     7  DISK_NUMBER
     8  HEADER_STATUS
     9  REDUNDANCY
    10  OS_MB
    11  TOTAL_MB
    12  FREE_MB
    13  HOT_USED_MB
    14  COLD_USED_MB
    15  DISKGROUP_NAME
    16  DISK_NAME
    17  FAILGROUP
    18  LABEL
    19  PATH
    20  UDID
    21  PRODUCT
    22  CREATE_DATE
    23  MOUNT_DATE
    24  REPAIR_TIMER
    25  PREFERRED_READ
    26  VOTING_FILE
    27  SECTOR_SIZE
    28  FAILGROUP_TYPE
    29  READS
    30  WRITES
    31  READ_ERRS
    32  WRITE_ERRS
    33  READ_TIME
    34  AVG_READ_TIME
    35  WRITE_TIME
    36  AVG_WRITE_TIME
    37  BYTES_READ
    38  BYTES_WRITTEN
    39  HOT_READS
    40  HOT_WRITES
    41  HOT_BYTES_READ
    42  HOT_BYTES_WRITTEN
    43  COLD_READS
    44  COLD_WRITES
    45  COLD_BYTES_READ
    46  COLD_BYTES_WRITTEN

So now that we know that INSTANCE is column #4, let’s see what the available instances are.


&gt; cut -f4 -d, logs/asm-oravm-20150512_01.csv | sort -u
INSTNAME
oravm1
oravm2

To get the headers and the data for only instance oravm1, and only for the DATA diskgroup, aggregated by diskgroup:


(head -1 logs/asm-oravm-20150512_01.csv ; grep ',oravm1,.*,DATA,' logs/asm-oravm-20150512_01.csv ) | \
  ./asm-metrics-aggregator.pl --grouping-cols DISKGROUP_NAME \
  --agg-cols READS WRITES READ_TIME WRITE_TIME \
  --display-cols DISPLAYTIME ELAPSEDTIME DBNAME DISKGROUP_NAME READS \
                 WRITES READ_TIME AVG_READ_TIME WRITE_TIME \
                 AVG_WRITE_TIME BYTES_READ BYTES_WRITTEN \
  -- &gt; new-file.csv

That should get you started on modifying the data via standard command line utilities.

To Do

These scripts do not know about any of the ASM enhancements found in 12c, so there is some room for improvement there. So far they have fit my needs, but you may have some ideas to make these scripts better. Or (heaven forbid) you found a bug. Either way, please try them out and let me know.

The scripts can all be found in asm-metrics on github.

Discover more about Pythian’s expertise in Oracle and Jared Still.

Categories: DBA Blogs

Handling the Leap Second – Linux

Pythian Group - Tue, 2015-06-02 13:31

Last week I published a blog post titled “Are You Ready For the Leap Second?“, and by looking at the blog statistics I could tell that many of you read it, and that’s good, because you became aware of the risks that the leap second on June 30th, 2015 introduces. On the other hand, I must admit I didn’t provide clear instructions that you could use to avoid all possible scenarios. I’ve been looking into this for a good while and I think the official RedHat announcements and My Oracle Support notes are confusing. This blog post is my attempt to explain how to avoid the possible issues.

Update (June 9th, 2015): Made it clear in the text below that ntp’s slewing mode (ntp -x) is mandatory from Oracle Grid Infrastructure and therefore for RAC too.

The complexity of solving these problems comes from the fact that there are multiple contributing factors. The behavior of the system will depend on a combination of these factors.
In the coming sections I’ll try to explain what exactly you should pay attention to and what you should do to avoid problems. The content of this post is fully theoretical and based on the documentation I’ve read. I have NOT tested it, so it may behave differently. Please, if you notice any nonsense in what I’m writing, let me know by leaving a comment!

1. Collect the data

The following information will be required for you to understand what you’re dealing with:

  1. OS version and kernel version:
    $ cat /etc/issue
    Oracle Linux Server release 6.4
    Kernel \r on an \m
    
    $ uname -r
    2.6.39-400.17.1.el6uek.x86_64
    
  2. Is NTP used and which version of NTP is installed:
    $ ps -ef | grep ntp
    oracle    1627  1598  0 02:06 pts/0    00:00:00 grep ntp
    ntp       7419     1  0 May17 ?        00:00:17 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
    
    $ rpm -qa | grep ntp-
    ntp-4.2.4p8-3.el6.x86_64
    
  3. Version of tzdata and the configuration of /etc/localtime:
    $ rpm -qa | grep tzdata-
    tzdata-2012j-1.el6.noarc
    
    $ file /etc/localtime
    /etc/localtime: timezone data, version 2, 5 gmt time flags, 5 std time flags, no leap seconds, 235 transition times, 5 abbreviation chars
    
2. Check the kernel

Here’s a number of bugs that are related to leap second handling on Linux:

  1. System hangs on printing the leap second insertion message – This bug will hang your server at the time when the NTP notifies kernel about the leap second, and that can happen anytime on the day before the leap second (in our case anytime on June 30th, 2015). It’s fixed in kernel-2.6.9-89.EL (RHEL4) and kernel-2.6.18-164.el5 (RHEL5).
  2. Systems hang due to leap-second livelock – Because of this bug systems repeatedly crash due to NMI Watchdog detecting a hang. This becomes effective when the leap second is added. The note doesn’t exactly specify which versions fix the bug.
  3. Why is there high CPU usage after inserting the leap second? – This bug causes futex-active applications (i.e. java) to start consuming 100% CPU. Based on what’s discussed in this email in Linux Kernel Mailing List Archive, it’s triggered by a mismatch between timekeeping and hrtimer structures, which the leap second introduces. The document again does not clearly specify which versions fix the problem, however this “Kernal Bug Fix Update” mentions these symptoms to be fixed in 2.6.32-279.5.2.el6.

MOS Note: “How Leap Second Affects the OS Clock on Linux and Oracle VM (Doc ID 1453523.1)” mentions that kernels 2.4 to 2.6.39 are affected, but I’d like to know the exact versions. I’ve searched a lot, but I haven t found much, so here are the ones that I did find:

I’m quite sure by reading this you’re thinking: “What a mess!”. And that’s true. I believe, the safest approach is to be on kernel 2.6.39-200.29.3 or higher.

3. NTP is used

You’re using NTP if the ntpd process is running. In the outputs displayed above it’s running and has the following arguments: ntpd -u ntp:ntp -p /var/run/ntpd.pid -g. The behavior of the system during the leap second depends on which version of NTP you use and what’s the environment.

  • ntp-4.2.2p1-9 or higher (but not ntp-4.2.6p5-19.el7, ntp-4.2.6p5-1.el6 and ntp-4.2.6p5-2.el6_6) configured in slew mode (with option “-x”) – The leap second is not added by kernel, but the extra time is added by increasing the length of each second over ~2000 second period based on the differences of the server’s time and the time from NTP after the leap second. The clock is never turned backward. This is the configuration you want because:
    • Time never goes back, so there will be no impact to the application logic.
    • Strange time values like 23:59:60 are not used, so you won’t hit any DATE and TIMESTAMP datatype limitation issues.
    • As the leap second is not actually added, It should be possible to avoid all 3 kernel bugs that I mentioned by using this configuration. In many cases updating NTP is much simpler than a kernel upgrade, so if you’re still on an affected kernel use this option to bypass the bugs.

    The drawbacks of this configuration are related to the fact that the leap second is smeared out over a longer period of time:

    • This probably is not usable for applications requiring very accurate time.
    • This may not be usable for some clusters where all nodes must have exactly the same clocktime, because NTP updates are usually received every 1 to 18 minutes, plus giving the ~2000 seconds of time adjustment in slew mode the clocks could be off for as long as ~50 minutes. Please note, the slewing mode is (ntp -x) is mandatory for Oracle Grid Infrastructure as documented in Oracle® Grid Infrastructure Installation Guides 11g Release 2 and 12c Release 1.
  • ntp-4.2.2p1-9 or higher configured without slew mode (no “-x” option) – The NTP will notify the kernel about the upcoming leap second some time during June 30th, and the leap second will be added as an extra “23:59:59″ second (time goes backward by one second). You will want to be on kernel with all fixes present.
  • below ntp-4.2.2p1-9 – The NTP will notify the kernel about the upcoming leap second some time during June 30th, and depending on the environment, the leap second will be added as an extra “23:59:59″ second (time goes backward by one second), or the time will freeze for one second at midnight.

Extra precaution: if you’re running NTP make sure your /etc/localtime does not include leap seconds by running “file /etc/localtime” and confirming it lists message “no leap seconds”.

4. NTP is NOT used

If NTP is not used the time is managed locally by the server. The time is most likely off already, so I really do recommend enabling NTP in slew mode as described above, this is the right moment to do so.

If you have tzdata-2015a or higher installed, the information about the leap second on June 30th, 2015 is also available locally on the server, but it doesn’t mean yet it’s going to be added. Also if NTP is not used and the leap second is added locally, it will appear as “23:59:60″, which is an unsupported value for DATE and TIMESTAMP columns, so this is the configuration you don’t want to use. Here are the different conditions:

  • You’re below tzdata-2015a – the leap second will not be added.
  • You’re on tzdata-2015a or higher and “file /etc/localtime” includes message “X leap seconds”, where X is a number – the leap second will be added as “23:59:60″ and will cause problems for your DATE/TIMESTAMP datatypes. You don’t want this configuration. Disable leap second by copying the appropriate timezone file from /usr/share/zoneinfo over /etc/localtime. It’s a dynamic change, no reboots needed. (Timezone files including the leap seconds are located in /usr/share/zoneinfo<strong>/right</strong>)
  • “file /etc/localtime” includes message “no leap seconds” – the leap second will not be added.
The recommendations

Again I must say this is a theoretical summary on how to avoid leap second issues on Linux, based on what’s written above. Make sure you think about it before implementing as you’re the one who knows your own systems:

  • Single node servers, or clusters where time between nodes can differ – Upgrade to ntp-4.2.2p1-9 or higher and configure it in slew mode (option “-x”). This should avoid the kernel bugs too, but due to lack of accurate documentation it’s still safer to be on kernel 2.6.39-200.29.3 or higher.
  • Clusters or applications with very accurate time requirements – NTP with slew mode is not suitable as it’s unpredictable when it will start adjusting the time on each server. You want to be on kernel 2.6.39-200.29.3 or higher. NTP should be enabled. Leap second will be added as an extra “23:59:59″ second (the time will go backward by one second). Oracle Database/Clusterware should detect time drifting and should deal with it. Check MOS for any bugs related to time drifting for the versions you’re running.
  • I don’t care about the time accuracy, I can’t update any packages, but need my systems up at any cost – The simplest solution to this is stopping the NTP on June 29th and starting it up on July 1st, so that the server was left unaware of the leap second. Also, you need to make sure the /etc/localtime does not contain the leap second for June 30th, 2015 as explained above.
    -- on June 29th (UTC)
    # /etc/init.d/ntpd stop
    # date -s "`date`"    (reset the system clock)
    -- on July 1st (UTC)
    # /etc/init.d/ntpd start
  • Very accurate time requirements + time reduction is not allowed – I don’t know. I can’t see how this can be implemented. Does anyone have any ideas?
Post Scriptum

Initially I couldn’t understand why this extra second caused so much trouble. Don’t we change the time by a round hour twice a year without any issues? I found the answers during the research, and it’s obvious. Servers work in UTC time, which does not have daylight saving time changes. The timezone information is added just for representation purposes later on. UTC Time is continuous and predictable, but the leap second is something which breaks this normal continuity and that’s why it is so difficult to handle it. It’s also a known fact that Oracle Databases rely heavily on gettimeofday() system calls and these work in UTC too.

 

Discover more about Pythian’s Oracle Ace Maris Elsins.

Categories: DBA Blogs