Skip navigation.

DBA Blogs

Why Data Virtualization Is so Vital

Kubilay Çilkara - Tue, 2015-11-24 16:35
In today’s day and age, it probably seems like every week you hear about a new form of software you absolutely have to have. However, as you’re about to find out, data virtualization is actually worth the hype.
The Old Ways of Doing Things
Traditionally, data management has been a cumbersome process, to say the least. Usually, it means data replication, data management or using intermediary connectors and servers to pull off point-to-point integration. Of course, in some situations, it’s a combination of the three.

Like we just said, though, these methods were never really ideal. Instead, they were just the only options given the complete lack of alternatives available. That’s the main reason you’re seeing these methods less and less. The moment something better came along, companies jumped all over them. However, their diminishing utility can also be traced to three main factors. These would be:
  • ·      High costs related to data movement
  • ·      The astronomical growth in data (also referred to as Big  Data)
  • ·      Customers that expect real-time information
These three elements are probably fairly self-explanatory, but that last one is especially interesting to elaborate on. Customers these days really don’t understand why they can’t get all the information they want exactly when they want it. How could they possibly make sense of that when they can go online and get their hands on practically any data they could ever want thanks to the likes of Google? If you’re trying to explain to them that your company can’t do this, they’re most likely going to have a hard time believing you. Worse, they may believe you, but assume that this is a problem relative to your company and that some competitor won’t have this issue.
Introducing Data Virtualization
It was only a matter of time before this problem was eventually addressed. Obviously, when so many companies are struggling with this kind of challenge, there’s quite the incentive for another one to solve it.

That’s where data virtualization comes into play. Companies that are struggling with having critical information spread out across their entire enterprise in all kinds of formats and locations never have to worry about the hardships of having to get their hands on it. Instead, they can use virtualization platforms to search out what they need.
Flexible Searches for Better Results
It wouldn’t make much sense for this type of software to not have a certain amount of agility built in. After all, that’s sort of its main selling point. The whole reason companies invest in it is because it doesn’t get held back by issues with layout or formatting. Whatever you need, it can find.

Still, for best results, many now offer a single interface that can be used to separate and extract aggregates of data in all kinds of ways. The end result is a flexible search which can be leverage toward all kinds of ends. It’s no longer about being able to find any type of information you need, but finding it in the most efficient and productive way possible.
Keep Your Mainframe
One misconception that some companies have about data virtualization is that it will need certain adjustments to be made to your mainframe before it can truly be effective. This makes sense because, for many platforms, this is definitely the case. These are earlier versions, though, and some that just aren’t of the highest quality.

With really good versions, though, you can basically transform your company’s mainframe into a virtualization platform. Such an approach isn’t just cost-effective. It also makes sure you aren’t wasting resources, including time, addressing the shortcomings of your current mainframe, something no company wants to do.

Don’t get turned off from taking a virtualization approach to your cache of data because you’re imagining a long list of chores that will be necessary for transforming your mainframe. Instead, just be sure you invest in a high-end version that will actually transform your current version into something much better.
A Better Approach to Your Current Mainframe
Let’s look at some further benefits that come from taking this approach. First, if the program you choose comes with the use of a high-performance server, you’ll immediately eliminate the redundancy of integrating from point-to-point. This will definitely give you better performance in terms of manageability. Plus, if you ever want to scale up, this will make it much easier to do so.

Proper data migration is key to a good virtualization process. If it is done right the end user wont have to worry out corrupted data and communication between machines will be crystal clear.If you divert the data mapping you need to do at processing-intensive level and transformation processes away from the General Purpose Processor of your mainframe to the zIIP specialty engine, you’ll get to dramatically reduce your MIPS capacity usage and, therefore, also reduce your company’s TCO (Total Cost of Ownership).

Lastly, maybe you’d like to exploit of every last piece of value you derive from your mainframe data. If so, good virtualization software will not only make this possible, but do so in a way that will let you dramatically turn all of your non-relational mainframe data virtualization into relational formats that any business analytics or intelligence application can use.
Key Features to Look for in Your Virtualization Platform
If you’re now sold on the idea of investing in a virtualization platform, the next step is getting smart about what to look for. As you can imagine, you won’t have trouble finding a program to buy, but you want to make sure it’s actually going to be worth every penny.

The first would be, simply, the amount of data providers available. You want to be able to address everything from big data to machine data to syslogs, distributed and mainframe. Obviously, this will depend a lot on your current needs, but think about the future too.

Then, there’s the same to think about in terms of data consumers. We’re talking about the cloud, analytics, business intelligence and, of course, the web. Making sure you will be able to stay current for some time is very important. Technology changes quickly and the better your virtualization process is the longer you’ll have before having to upgrade. Look closely at the migration process, and whether or not the provider can utilize your IT team to increase work flow. This will help you company get back on track more quickly and with better results.

Finally, don’t forget to look at costs, especially where scalability is concerned. If you have plans of getting bigger in the future, you don’t want it to take a burdensome investment to do so. As you can see, virtualization platforms definitely live up to the hype.You just have to be sure you spend your money on the right kind.
Mike Miranda writes about enterprise software and covers products offered by software companies like Rocket software about topics such as Terminal emulation,  Enterprise Mobility and more.
Categories: DBA Blogs

Advanced Compression Option Caveat in Oracle 12c

Pythian Group - Tue, 2015-11-24 12:18

 

Oracle 12c introduced a new capability to move a partition online, without any interruptions to DML happening at the same time. But, there’s a catch. So far we’ve been able to use basic table compression without having to worry about any extra licensing – it was just a plain EE feature.

If you are planning to use the online partition move functionality, carefully check if you’re not using basic compression anywhere. For example:

create tablespace data datafile '+DATA' size 1g
/

create user foo identified by bar
default tablespace data
quota unlimited on data
/

grant create session, create table to foo
/

connect foo/bar

create table test (x int, y varchar2(20))
partition by range (x)
(
partition p1 values less than (100) tablespace data compress,
partition p2 values less than (200) tablespace data,
partition p3 values less than (300) tablespace data
)
/

So we now have this, and our licensing is still as we know it:

select partition_name, compression, compress_for from user_tab_partitions
/
PARTITION_NAME COMPRESS COMPRESS_FOR
------------------------------ -------- ------------------------------
P1 ENABLED BASIC
P2 DISABLED
P3 DISABLED

We can use the new feature on partition p3:

alter table test move partition p3
online
/

Or, we can use the traditional means to compress the partition p2:

alter table test move partition p2
compress
/

But as soon as we do this move “online”, we are required to purchase the Advanced Compression Option:

alter table test move partition p2
compress
online
/

And, even sneakier:
alter table test move partition p1
online
/

Notice how partition p1 – which was previously compressed – also was online moved to a compressed format:

select partition_name, compression, compress_for from user_tab_partitions
/

PARTITION_NAME COMPRESS COMPRESS_FOR
—————————— ——– ——————————
P1 ENABLED BASIC
P2 ENABLED BASIC
P3 DISABLED

 

And that, therefore, required the Advanced Compression Option.

Also note that the usage of this is not caught by dba_feature_usage_statistics (tested on 12.1.0.2):

select name, currently_used from dba_feature_usage_statistics where lower(name) like '%compress%';

NAME CURRE
—————————————————————- —–
Oracle Advanced Network Compression Service FALSE
Backup ZLIB Compression FALSE
Backup BZIP2 Compression FALSE
Backup BASIC Compression FALSE
Backup LOW Compression FALSE
Backup MEDIUM Compression FALSE
Backup HIGH Compression FALSE
Segment Maintenance Online Compress FALSE
Compression Advisor FALSE
SecureFile Compression (user) FALSE
SecureFile Compression (system) FALSE
HeapCompression FALSE
Advanced Index Compression FALSE
Hybrid Columnar Compression FALSE
Hybrid Columnar Compression Row Level Locking FALSE

15 rows selected.

I also tried to bounce the database and the data wasn’t updated in my tests. I would’ve expected this to show up under “Segment Maintenance Online Compress”, but in my tests, it did not.

This feature restriction isn’t documented anywhere in the official product documentation – at least not that I could find. The only place where I could find this information was in this Oracle document.

 

Discover more about our experience in the world of Oracle.

Categories: DBA Blogs

Implementing Fuzzy Search in SQL Server – Part 2: Levenshtein Distance

Pythian Group - Mon, 2015-11-23 15:18

 

The Levenshtein Distance, as discussed in my last post, is a way to measure how far two strings are located from each other. There are several T-SQL implementations of this functionality, as well as many compile versions. In addition, the MDS library in SQL Server has a Similarity function which uses the Levenshtein Distance to find how similar two words are.

In this post, I’ve done a simple comparison of performance using a C# CLR implementation of Levenshtein Distance (The code is from the Wiki), and a well written T-SQL implementation from Arnold Fribble.

As many of you might expect, the C# implementation is much quicker. Needing only 2504 ms to run through dictionary table of 203,118 words. The T-SQL implementation took 42718 ms for the same work.

A comparison of two ways to implement the Levenshtein Distance

Levenshtein Distance Comparison

 

Levenshtein Distance CLR

To implement the Levenshtein Distance CLR, run this SQL Script

 

Levenshtein Distance T-SQL

To implement the Levenshtein Distance in T-SQL, run the below code. Please note that this function has a cut-off value (@d) where it simply gives up and returns -1.


SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO

CREATE FUNCTION edit_distance_within(@s nvarchar(4000), @t nvarchar(4000), @d int)
RETURNS int
AS
BEGIN
DECLARE @sl int, @tl int, @i int, @j int, @sc nchar, @c int, @c1 int,
@cv0 nvarchar(4000), @cv1 nvarchar(4000), @cmin int
SELECT @sl = LEN(@s), @tl = LEN(@t), @cv1 = ”, @j = 1, @i = 1, @c = 0
WHILE @j <= @tl
SELECT @cv1 = @cv1 + NCHAR(@j), @j = @j + 1
WHILE @i <= @sl
BEGIN
SELECT @sc = SUBSTRING(@s, @i, 1), @c1 = @i, @c = @i, @cv0 = ”, @j = 1, @cmin = 4000
WHILE @j @c1 SET @c = @c1
SET @c1 = UNICODE(SUBSTRING(@cv1, @j, 1)) + 1
IF @c > @c1 SET @c = @c1
IF @c @d BREAK
SELECT @cv1 = @cv0, @i = @i + 1
END
RETURN CASE WHEN @cmin <= @d AND @c <= @d THEN @c ELSE -1 END
END
GO

 

Discover more about our expertise in SQL Server

Categories: DBA Blogs

ORA-10173: Dynamic Sampling Time-Out Error

Pythian Group - Mon, 2015-11-23 15:06

 

Recently on a heavily used and freshly upgraded 12.1.0.2 ware-house type database, we started seeing lots of ORA-10173 dumped into the alert log. The information out there on this error is somewhat sparse, and it is often linked to Tuning Advisor functionality. Since we’re not running that advisor on this database, a little digging was in order.

What I always do as a first step if I am confronted with an error where I am not certain why and exactly where it is raised, is to set up an error trap. In Oracle, this can be done by setting an errorstack event, like this:

alter session set events '10173 trace name errorstack level 1';

Please note: whenever possible you’d want to set this on the smallest scope possible – starting with the session level, eventually in a login trigger for multiple sessions, and only when all that fails, on the system level using the alter system statement.

Once you have trapped an occurrence or two, you can disable it again by running:
alter session set events '10173 trace name errorstack off';

Upon activating this event, the next time around when this exception is raised Oracle will write a trace file to the diagnostics repository. Two two most prevalent pieces of information in the trace file are the current SQL query:

----- Current SQL Statement for this session (sql_id=anbp9r5n8ysu6) -----
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") */ 1 AS C1 FROM (SELECT "INV"."CASE_UPC" "ITEM_1","INV"."WHS_NBR" "ITEM_2",SUBSTR("INV"."CASE_UPC",3,11) "ITEM_3" FROM "XX"."XX_WHS_INV" "INV","XX"."XX_CASE_DIM" "CD2","XX"."XX_WHS_DIM" "WD" WHERE "WD"."WHS_DESC" LIKE '%PEY%' AND "CD2"."WHS_NBR"="INV"."WHS_NBR" AND "CD2"."CASE_UPC"="INV"."CASE_UPC" AND "INV"."WHS_NBR"="WD"."WHS_NBR" GROUP BY "INV"."CASE_UPC","INV"."WHS_NBR",SUBSTR("INV"."CASE_UPC",3,11)) "VW_DIS_13") innerQuery

And the stack trace, which looks something like this:

skdstdst()+29 call kgdsdst() 7FFFA592F860 ? 000000000 ?
7FFFA5912120 ? 7FFFA5912238 ?
7FFFA592FF18 ? 7FFFA592F810 ?
ksedst()+112 call skdstdst() 7FFFA592F860 ? 000000000 ?
7FFFA5912120 ? 7FFFA5912238 ?
7FFFA592FF18 ? 7FFFA592F810 ?
dbkedDefDump()+1153 call ksedst() 000000000 ? 000000000 ?
7FFFA5912120 ? 7FFFA5912238 ?
7FFFA592FF18 ? 7FFFA592F810 ?
ksedmp()+26 call dbkedDefDump() 000000001 ? 000000000 ?
7FFFA5912120 ? 7FFFA5912238 ?
7FFFA592FF18 ? 7FFFA592F810 ?
dbkdaKsdActDriver() call ksedmp() 000000001 ? 000000000 ?
+880 7FFFA5912120 ? 7FFFA5912238 ?
7FFFA592FF18 ? 7FFFA592F810 ?
dbgdaExecuteAction( call dbkdaKsdActDriver() 7F482FDE8618 ? 7FFFA5932830 ?
)+319 7FFFA5912120 ? 7FFFA5912238 ?
7FFFA592FF18 ? 7FFFA592F810 ?
... ( rest omitted for clarity and we don't need the stack for the purpose of this post )

For my case, the interesting bit was the query that caused the error to be raised. This query provides us with some hints about what’s going on:

– We can see the DS_SVC comment at the beginning of the query
– We can see lots of hints, the more interesting ones being dynamic_sampling(0) and result_cache(snapshot=3600)

A little bit of research led to a new feature introduced in 12c: Dynamic Statistics (or Adaptive Statistics). This is an enhancement to dynamic sampling, where the optimizer can automatically choose to gather more statistics as part of the parse phase. The sampling is controlled internally with a time out, and if that time out is hit, ORA-10173 is raised to the alert log.

This means that these errors are generally safe to ignore, as they are raised purely internally and your application won’t see these exceptions. Your query didn’t, and won’t fail. However, your query may well be spending more time parsing and sampling data than what makes sense. My colleague Slava has already blogged about such a case here: Performance Problems with Dynamic Statistics in Oracle 12c and there are other documented cases.

The feature can be controlled through the optimizer_dynamic_sampling parameter. If it’s set to the default value of 2, the optimizer may choose to raise that temporarily to 11 which enables these new dynamic features. You can tell by looking at an execution plan of a query using dbms_xplan.display_cursor for example, and looking at the notes section:

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

As always, to determine whether or not this feature is helpful or harmful to your applications: benchmark it. You probably don’t want to turn it off system-wide, but it may make a lot of sense to disable it for certain sessions, or certain queries in your application. The ORA-10173 you may see in the alert log however is no reason to panic, if your application is not negatively impacted. If you are mostly running queries that take minutes or longer to execute, spending a couple seconds gathering more statistics may not impact you. If on the other hand your queries are fast, and Oracle spends more time parsing and sampling than it actually takes to execute the query, or you are seeing loads and loads of ORA-10173, you may want to take action. Alternatively you may also resort to setting a _fix_control as mentioned in Slava’s post – but as always, double-check with Oracle support if setting it has the desired effect on your exact version of Oracle.

In essence, the ORA-10173 indicates wasted work, because Oracle is spending time gathering data, hits the timeout and then throws the partial data away since it’s incomplete. This certainly isn’t optimal, but again, your mileage may vary if this really impacts your applications. In our case, we are seeing thousands of them, and we have opened a case with Oracle, since obviously somewhere the optimizer keeps making the wrong choice. The bug is currently still with BDE but I shall keep you posted if anything develops.

Another thing worth noting is that Oracle stores the results of these dynamic sampling queries in the result cache, and marks them for expiration after a certain time has elapsed. This timeout that we are seeing with the result_cache(snapshot=3600) hint, comes from the hidden parameter _optimizer_ads_result_cache_life which defaults to 3600 in 12.1.0.2. This may also help in reducing the frequency of the dynamic sampling queries by increasing the lifetime for which they remain cached. But as always, before changing any hidden parameters, please consult with your local Oracle support representative!

 

Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs

Implementing Fuzzy Search in SQL Server – Part 2: Levenshtein Distance

Pythian Group - Mon, 2015-11-23 14:40

The Levenshtein Distance, as discussed in my last post, is a way to measure how far two strings are located from each other. There are several T-SQL implementations of this functionality, as well as many compiled versions. In addition, the MDS library in SQL Server has a Similarity function which uses the Levenshtein Distance to find how similar two words are.

In this post, I’ve done a simple comparison of performance using a C# CLR implementation of Levenshtein Distance (The code is from the Wiki), and a well written T-SQL implementation from Arnold Fribble.

As many of you might expect, the C# implementation is much quicker. Needing only 2504 ms to run through dictionary table of 203,118 words. The T-SQL implementation took 42718 ms for the same work.

A comparison of two ways to implement the Levenshtein Distance

Levenshtein Distance Comparison

 

Levenshtein Distance CLR

To implement the Levenshtein Distance CLR, run this SQL Script

 

Levenshtein Distance T-SQL

To implement the Levenshtein Distance in T-SQL, run the below code. Please note that this function has a cut-off value (@d) where it simply gives up and returns -1.

CREATE FUNCTION edit_distance_within(@s nvarchar(4000), @t nvarchar(4000), @d int)
RETURNS int
AS
BEGIN
  DECLARE @sl int, @tl int, @i int, @j int, @sc nchar, @c int, @c1 int,
    @cv0 nvarchar(4000), @cv1 nvarchar(4000), @cmin int
  SELECT @sl = LEN(@s), @tl = LEN(@t), @cv1 = '', @j = 1, @i = 1, @c = 0
  WHILE @j <= @tl
    SELECT @cv1 = @cv1 + NCHAR(@j), @j = @j + 1
  WHILE @i <= @sl
  BEGIN
    SELECT @sc = SUBSTRING(@s, @i, 1), @c1 = @i, @c = @i, @cv0 = '', @j = 1, @cmin = 4000
    WHILE @j <= @tl BEGIN SET @c = @c + 1 SET @c1 = @c1 - CASE WHEN @sc = SUBSTRING(@t, @j, 1) THEN 1 ELSE 0 END IF @c > @c1 SET @c = @c1
      SET @c1 = UNICODE(SUBSTRING(@cv1, @j, 1)) + 1
      IF @c > @c1 SET @c = @c1
      IF @c < @cmin SET @cmin = @c SELECT @cv0 = @cv0 + NCHAR(@c), @j = @j + 1 END IF @cmin > @d BREAK
    SELECT @cv1 = @cv0, @i = @i + 1
  END
  RETURN CASE WHEN @cmin <= @d AND @c <= @d THEN @c ELSE -1 END
END

Discover more about our expertise in SQL Server

Categories: DBA Blogs

What’s the Ratio of Your Marketing Budget to Cyber Security Budget?

Pythian Group - Sun, 2015-11-22 19:45

This was a question asked over Twitter by @DanBarker to TalkTalk, the major British telco after they managed to lose a significant portion of its customers’ details, apparently through an SQL Injection attack by a 15 year old.

The question wasn’t answered, but the sooner companies realise that a security incident can wipe out a significant part of the brand’s goodwill, the more this ratio will adjust.

Here are three top tips to ensure you’re investing wisely in cyber security, and protecting your brand’s good name:

1. Keep everything patched and up to date – old databases have security holes that can lead to an attack. A new client of ours was running SQL Server 2002 and failed a pen-test in 30 minutes. But it doesn’t need to be that old to fail.

2. Audit and document everything. What data is where? Who or what has access? Do they need it? Are they still with the company? Not knowing what data might be lost was the major problem at the NSA post-Snowden. And within hours of the TalkTalk hack, the CEO said “I don’t know today whether all four million customers’ details have been stolen” (it was about 150,000 in the end, but by then the brand damage was done).

3. Check how employees and third party suppliers access your production environment, to make sure it’s from a safe, virus-free place. Can you see what they see? Do you know what they’re looking at?

Overall, to use Pythian’s tagline, just learn to “love your data”.

If your in-house team doesn’t have the skills and expertise to take care of these tasks, then find a company or contractor that does. The cost will be far less than a major security incident. And probably lower than your marketing budget too.

Categories: DBA Blogs

Trace Files -- 8b : More Performance Evaluation from Trace File

Hemant K Chitale - Sun, 2015-11-22 04:40
Continuing with the same table as in the previous example, but with the addition of
(a) Statistics on the table
(b) An Index on the CREATED column

SQL> select count(*) from all_objects_many_list;

COUNT(*)
----------
7197952

SQL>
SQL> select min(created) from all_objects_many_list;

MIN(CREAT
---------
28-AUG-11

SQL>
SQL> select min(created), max(created) from all_objects_many_list;

MIN(CREAT MAX(CREAT
--------- ---------
28-AUG-11 15-NOV-15

SQL>
SQL>

Let's examine the tkprof output first.

[oracle@ora11204 Hemant]$ tkprof /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2982.trc QRYS_22NOV.PRF aggregate=NO

TKPROF: Release 11.2.0.4.0 - Development on Sun Nov 22 17:44:13 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.


[oracle@ora11204 Hemant]$

SQL ID: cwdmaym4gk0y2 Plan Hash: 2662061148

select count(*)
from
all_objects_many_list


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.79 34.67 97793 97798 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.79 34.70 97793 97798 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 87
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=97798 pr=97793 pw=0 time=34677183 us)
7197952 7197952 7197952 TABLE ACCESS FULL ALL_OBJECTS_MANY_LIST (cr=97798 pr=97793 pw=0 time=48480880 us cost=26777 size=0 card=7197952)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
Disk file operations I/O 1 0.00 0.00
db file sequential read 1 0.00 0.00
direct path read 1542 0.20 33.81
SQL*Net message from client 2 0.00 0.00
********************************************************************************




SQL ID: 72fjkwbds0wxf Plan Hash: 3190284407

select min(created)
from
all_objects_many_list


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.01 3 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.02 3 3 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 87
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=3 pr=3 pw=0 time=17972 us)
1 1 1 INDEX FULL SCAN (MIN/MAX) ALL_OBJ_M_L_CRTD_NDX (cr=3 pr=3 pw=0 time=17950 us cost=3 size=8 card=1)(object id 35037)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 3 0.00 0.01
SQL*Net message from client 2 0.00 0.00
********************************************************************************



SQL ID: dcrcwkktu08mv Plan Hash: 2662061148

select min(created), max(created)
from
all_objects_many_list


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.31 1.27 97792 97797 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.31 1.28 97792 97797 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 87
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=97797 pr=97792 pw=0 time=1278898 us)
7197952 7197952 7197952 TABLE ACCESS FULL ALL_OBJECTS_MANY_LIST (cr=97797 pr=97792 pw=0 time=17474658 us cost=26854 size=57583616 card=7197952)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
direct path read 1542 0.00 0.16
SQL*Net message from client 3 14.02 14.02
SQL*Net break/reset to client 2 0.00 0.00
********************************************************************************


The first query (the count(*) did a Full Table Scan and read 7.198million rows reading 97,793 blocks from disk and 97,798 consistent gets. The "pr" count is the count of Physical Reads and the "cr" is the count of Logical Reads (there are no "cur" -- current mode block gets -- in this example). The Logical Reads count includes the Physical Reads count, do not attempt to add them.
During a Full Table Scan, Oracle may have to read the Segment Header and/or additional blocks to read the Extent Map.  Thus, some blocks may be read multiple times, the first reading may have been from Disk, subsequent readings from the Buffer Cache.
In this case, the Segment Header is identified as :
SQL> select header_file, header_block from dba_segments where owner = 'HEMANT' and segment_name = 'ALL_OBJECTS_MANY_LIST';

HEADER_FILE HEADER_BLOCK
----------- ------------
4 330

SQL>

and identified from the trace file as the 'db file sequential read' (single block read) as the first disk read before the 'direct path reads':
PARSING IN CURSOR #140647645766312 len=42 dep=0 uid=87 oct=3 lid=87 tim=1448185243704201 hv=3371762626 ad='6bd59a08' sqlid='cwdmaym4gk0y2'
select count(*) from all_objects_many_list
END OF STMT
PARSE #140647645766312:c=5999,e=33833,p=3,cr=39,cu=0,mis=1,r=0,dep=0,og=1,plh=2662061148,tim=1448185243704195
EXEC #140647645766312:c=0,e=48,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2662061148,tim=1448185243704326
WAIT #140647645766312: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=13246 tim=1448185243704401
WAIT #140647645766312: nam='Disk file operations I/O' ela= 39 FileOperation=2 fileno=4 filetype=2 obj#=35014 tim=1448185243704787
WAIT #140647645766312: nam='db file sequential read' ela= 6291 file#=4 block#=330 blocks=1 obj#=35014 tim=1448185243711101
--------------------------------------------------------------------------------------------------------------------------
WAIT #140647645766312: nam='direct path read' ela= 3626 file number=4 first dba=331 block cnt=13 obj#=35014 tim=1448185243714977
WAIT #140647645766312: nam='direct path read' ela= 5029 file number=4 first dba=345 block cnt=15 obj#=35014 tim=1448185243720084

Note the underlined 'db file sequential read' of block#=330 from file number=4.
The third query also does a Full Table Scan but doesn't show a 'db file sequential read' wait for this block.  This Segment Header block was persisted in the Buffer Cache.  All the other blocks of the table read via 'direct path read' in the first Full Table Scan query were not persisted in the Buffer Cache.

The raw trace file in this case will NOT show the individual Logical Reads, only the Physical Reads when there are Waits.  However, the Logical Reads are visible in the STAT lines in the raw trace file.  These STAT lines for the first Full Table Scan query are visible as :
STAT #140647645766312 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=97798 pr=97793 pw=0 time=34677183 us)'
STAT #140647645766312 id=2 cnt=7197952 pid=1 pos=1 obj=35014 op='TABLE ACCESS FULL ALL_OBJECTS_MANY_LIST (cr=97798 pr=97793 pw=0 time=48480880 us cost=26777 size=0 card=7197952)'

These STAT lines show
(a) The Object ID for the Table being 35014
(b) The estimated COST for the Execution Plan was 26,777 (shown as "cost=").
(c) The estimated CARDINALITY (Number of Rows from the step in the Execution Plan was 7,197,952 (shown as "card=")
(d) The elapsed time for the Full Table Scan was 48.481million microseconds.
Note : Particularly on short-running steps in an Execution Plan and rounding of time, there may be some mismatch between the time reported individually in the raw trace and tkprof and the actual time.

If we look at the STAT line for the third query doing a Full Table Scan
STAT #140647645766312 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=97797 pr=97792 pw=0 time=1278898 us)'
STAT #140647645766312 id=2 cnt=7197952 pid=1 pos=1 obj=35014 op='TABLE ACCESS FULL ALL_OBJECTS_MANY_LIST (cr=97797 pr=97792 pw=0 time=17474658 us cost=26854 size=57583616 card=7197952)'

we note that the second Full Table Scan took 17.475million microseconds. This is much less time than the first Full Table Scan (48.481million). Thus, we can see that there was some impact of caching either in the Storage Cache or Server / FileSystem Cache which enabled these Disk Reads to be faster.

Note the difference in the "cost=" and "size=" values between the first Full Table Scan and the second Full Table Scan.  The first Full Table Scan had cost=26,777 and size=0.  The second Full Table Scan had cost=26,854 and size=57,583,616.  The additional "size" and "cost" in the second Full Table Scan is for reading the actual values in the "CREATED" column.  The first Full Table Scan only had to count the number of rows and did not have to return any column values.  The second query had to interpret the rows and return the "CREATED" values.  This is evident if we see the difference in the Explain Plans and note the "BYTES"

SQL> explain plan for select count(*) from all_objects_many_list;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2662061148

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26777 (1)| 00:05:22 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| ALL_OBJECTS_MANY_LIST | 7197K| 26777 (1)| 00:05:22 |
------------------------------------------------------------------------------------

9 rows selected.

SQL> explain plan for select min(created), max(created) from all_objects_many_list;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2662061148

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 26854 (1)| 00:05:23 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | TABLE ACCESS FULL| ALL_OBJECTS_MANY_LIST | 7197K| 54M| 26854 (1)| 00:05:23 |
--------------------------------------------------------------------------------------------

9 rows selected.

SQL>

Note how the query on MIN(CREATED), MAX(CREATED) shows that it returns 54million bytes (reading the CREATED column values for all the rows) and, therefore, has a higher Cost.



The Full Table Scans are executed as 'direct path read' operations and *not* reads from the database instance Buffer Cache in the SGA.  These Direct Reads are a result of the "serial direct read" behaviour of 11g used for large tables --as noted in my previous blog post, this table is larger than the Buffer Cache.  Had it been a small table (or pre-11g), the reads would have been executed with 'db file scattered read' waits and loaded into the Buffer Cache and it is possible that some of the blocks may have been present in the Buffer Cache for the third query.


The second query, using an Index on the CREATED column did 3 block reads as 'db file sequential read'.

PARSING IN CURSOR #140647645766312 len=46 dep=0 uid=87 oct=3 lid=87 tim=1448185278404154 hv=3682628526 ad='7f931480' sqlid='72fjkwbds0wxf'
select min(created) from all_objects_many_list
END OF STMT
PARSE #140647645766312:c=4999,e=4087,p=1,cr=30,cu=0,mis=1,r=0,dep=0,og=1,plh=3190284407,tim=1448185278404148
EXEC #140647645766312:c=0,e=31,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3190284407,tim=1448185278404230
WAIT #140647645766312: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=35014 tim=1448185278404299
WAIT #140647645766312: nam='db file sequential read' ela= 5021 file#=1 block#=61217 blocks=1 obj#=35037 tim=1448185278409390
WAIT #140647645766312: nam='db file sequential read' ela= 6886 file#=1 block#=63109 blocks=1 obj#=35037 tim=1448185278416360
WAIT #140647645766312: nam='db file sequential read' ela= 5782 file#=1 block#=61218 blocks=1 obj#=35037 tim=1448185278422228
FETCH #140647645766312:c=0,e=17979,p=3,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3190284407,tim=1448185278422304
STAT #140647645766312 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=3 pr=3 pw=0 time=17972 us)'
STAT #140647645766312 id=2 cnt=1 pid=1 pos=1 obj=35037 op='INDEX FULL SCAN (MIN/MAX) ALL_OBJ_M_L_CRTD_NDX (cr=3 pr=3 pw=0 time=17950 us cost=3 size=8 card=1)'


The three single block reads (Root, Branch, Leaf) for object 35037 (the Index) are from file=1 which happens to be the SYSTEM tablespace.
SQL> l
1* select tablespace_name , blevel from dba_indexes where owner = 'HEMANT' and table_name = 'ALL_OBJECTS_MANY_LIST'
SQL> /

TABLESPACE_NAME BLEVEL
------------------------------ ----------
SYSTEM 2

SQL>


tkprof does not show this information about the index being in the different (actually wrong !) Tablespace.  Only a reading of the trace file identifies this.  There are 3 blocks (Root, Branch, Leaf) to be read and the BLEVEL indicates this.  Block 61217 is the block immediately after the Segment Head Block (61216).  Block 63109 is a Branch Block and Block 61218 is a Leaf Block.  This query was fully satisfied from the Index alone and didn't have to read any Table blocks.

.
.
.


Categories: DBA Blogs

Links for 2015-11-20 [del.icio.us]

Categories: DBA Blogs

Log Buffer #450: A Carnival of the Vanities for DBAs

Pythian Group - Fri, 2015-11-20 15:08

This Log Buffer Editions picks few blog posts from Oracle, SQL Server and MySQL.

Oracle:

  • If you grant the DBA role to a user, Oracle also grants it the UNLIMITED TABLESPACE system privilege. If you then revoke the DBA role from this user, Oracle also revokes its UNLIMITED TABLESPACE system privilege.
  • Lost SYSMAN password OEM CC 12gR5.
  • How Terminal Emulation Assists Easy Data Management.
  • Using EMCLI List Verb to Get Detailed Information of EM Targets.
  • How to change apex_public_user password in ORDS.

SQL Server:

  • When the connection between you and the the target host are multiple servers across the continent, the latency will drive crazy mad.
  • SQLCMD and Batch File magic.
  • Greg Larson walks through the GUI installation process for SQL Server 2016 and explore these new installation options.
  • A Single-Parameter Date Range in SQL Server Reporting Services.
  • Is SQL Server killing your application’s performance?

MySQL:

  • MariaDB Galera Cluster 10.0.22 and Connector updates.
  • Building MaxScale from source on CentOS 7.
  • Orchestrator & Pseudo-GTID for binlog reader failover.
  • InnoDB holepunch compression vs the filesystem in MariaDB 10.1.
  • Open-sourcing PinLater: An asynchronous job execution system.

 

Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

Links for 2015-11-19 [del.icio.us]

Categories: DBA Blogs

Lost SYSMAN password OEM CC 12gR5

Hans Forbrich - Thu, 2015-11-19 10:42
I run my own licensed Oracle products in-house.  Since it is a very simple environment, largely used to learn how things run and verify what I see at customer sites, it is not very active at all.  But it is important enough to me to keep it maintained.

After a bit of a hiatus in looking at the OEM, which is at 12cR5 patched, I went back and noted that I was using the wrong password.  No problem, I thought: since OMS uses VPD and database security, just change the password in the database.

While I'm there, might as well change the SYSMAN password as well, since I have a policy of rotated passwords.

A few things to highlight (as a reminder for next time):


  • Use the right emctl.  There is an emctl for the OMS, the AGENT and the REPO DB.  In this case, I've installed the OMS under middleware, therefore  
    • /u01/app/oracle/middleware/oms/bin/emctl
  • Check the repository and the listener
  • Start the OMS.  
    • If the message is "Failed to connect to repository database. OMS will be automatically restarted once it identifies that database and listener are up." there are a few possibilities:
      • database is down
      • listener is down
    • If the message is "Connection to the repository failed. Verify that the repository connection information provided is correct." check whether 
      • SYSMAN password is changed or 
      • SYSMAN is locked out
  • To change the sysman password:
    • In database 
      • sqlplus / as sysdba
      • alter user SYSMAN identified by new_pwd account unlock;
    • In oms
      • ./emctl stop oms
      • ./emctl config oms -change_repos_pwd -use_sys_pwd -sys_pwd sys_pwd -new_pwd new_pwd
      • ./emctl stop oms 
      • ./emctl start oms
And test it out using the browser ...
Categories: DBA Blogs

How Terminal Emulation Assists Easy Data Management

Kubilay Çilkara - Wed, 2015-11-18 21:25
Just about every company will need terminal emulation at some point. Yours may not need it now, but as time goes on, the day will come when you need information that can only be accessed with an emulator. This software allows a computer to basically take on the functionality of an older version. Doing so makes it possible for the user to access data that would otherwise be impossible to find. If you’re not sold on the benefits of using this type of software, consider the following ways that it assists with making data management easier.
Obtain Data from Outdated Systems
This is the most obvious way a terminal emulator helps with data management. Right now, you could have all kinds of data out of reach because it was created with software you no longer used or stored on a platform that differs from the current one on your system.

Without an emulator, you’re basically without options. You simply have to find a workaround. The best solution would involve using a machine with the old operating system installed. This isn’t just extremely inconvenient though; it also isn’t very cost-effective and is sure to become a bottleneck sooner or later.

With terminal emulation, no data ever has to be out of reach. Whether its information from 10 years ago or 20, you can quickly retrieve whatever it is you need.
Access Multiple Terminals at Once
There’s far more these applications can do to assist with data management though. Over time, your company has probably—or will, someday—go through multiple platforms. This means that going back and grabbing the data you need could involve more than one system. If you tried using the aforementioned workaround, you’d be in for a huge challenge. It would take multiple computers, each with a specific operating system and then going between them to get the information you need or cross reference it as necessary.Modern emulators can access as many terminals as you need all on the same computer. Usually, it just involves putting each one on separate tabs. Not only can you get all the info you need, then, you can do it from the screen all at once. This makes it extremely easy to cross reference data from one system against others.
Customize Your Queries Another great benefit that comes with terminal emulation is that you can actually customize your searches to various degrees. For many companies, accessing old data means looking at screens that represent the info in the most rudimentary of ways. There may only be a few colors used for fonts on a black screen. Obviously, this can make data management a bit of a challenge, to say the least.With the right software, though, you can control the font color, the character size, background and more. This makes it much easier to see the information you want, highlight it and otherwise keep track of the data. Never again suffer through old screens from outdated platforms when you need something.
Mobile Functionality
Like just about everything these days, terminal emulators have now gone mobile. You can now pull off emulation from just about anywhere in the world on your mobile device. This is a great way to make emulation possible for any work at home employees who may be working for your company. If you hire on a consultant or freelance professional from somewhere outside the office, mobile functionality means they can now benefit from easy data management. Next time you’re on a business trip and need to access information from yesteryear, the ability to do so will be at your fingertips.

Mobile functionality may not seem like the most important aspect to have as far as emulation is concerned, but it’s better to have the option there than wish it was possible.
Save Money
Data management can be a costly endeavor. Unfortunately, it’s one of those costs your company really can’t hope to avoid. You need to manage your data, so you better find room in your budget. With terminal emulation, you can at least save money on this essential process.

Like we brought up earlier, without an emulator, you’re looking at a lot of hardware in order to make sure vital data is never out of reach, to say nothing of how inconvenient that option would be. You’re also looking at the potential costs of losing that data if anything happens to your dedicated machines. Furthermore, added hardware always comes with extra costs. There’s the space you need, electricity, IT support and more.

In light of those added expenses, simply using emulation software should be a no-brainer. You don’t need any extra hardware and these platforms are designed to stay out of the way until you need them, so they won’t hurt your staff’s ability to use their current machines.
Limitless Scalability
No matter what kind of software we’re talking about, scalability always needs to be a concern. Today, your company may only have so much data to manage. A year from now, though, there’s going to be a lot more. Five years from now, your company’s collection of data will be expansive.Obviously, managing this data efficiently is going to take a resource that can scale without difficulty. Keep in mind that a lot of companies increase their amount of data exponentially. This means scalability is necessary, but so is being able to scale at a moment’s notice to whatever size is needed.Terminal emulation and scalability are virtually inseparable when you have the right software. No alternative solution is going to be able to keep up. Again, if you tried using hardware to manage your data, you can forget about easy scalability and forget about doing so without spending a lot of money in the process.


Data management needs to be a priority for every organization, no matter what industry they’re in. However, simple data management isn’t enough anymore. Instead, you need emulation software that will make it easy, cost-effective and scalable. Otherwise, your business will always be greatly limited in what information it can access and the benefits that could be derived from it.

Mike Miranda writes about enterprise software and covers products offered by software companies like Rocket software about topics such as Terminal emulation,  Enterprise Mobility and more.
Categories: DBA Blogs

Pythian Champions Gender Diversity in Tech

Pythian Group - Wed, 2015-11-18 13:45

 

At Pythian, we embody and value diversity in every form—in our religion (or lack thereof), our geography, our culture, and of course the location we’ve chosen for our home.  We deeply believe that the best teams are diverse and we are committed to increasing our diversity in all these areas.

We’ve been doing time-zone diversity for over a decade and we’ll keep doing it—we’re actually pretty good at it. In support of time zone diversity, we’ve become an industry leader in the art of creating high-performance teams by blending people from all over the world regardless of religion, culture, country of origin or residence. But now it’s time for us to turn our attention to gender diversity.

And so, with this note and video, I am proud to announce the Pythia Program. We’re going to increase the number of brilliant women who work with us and we will take a leadership position in a very important conversation to encourage more women to enter STEM fields and the tech industry.

Inspired by Microsoft, Twitter, Pinterest and Google, Pythian is the first technology company headquartered in Canada to announce its gender stats: 35 percent of the company’s leadership team are women, 27 percent of our managers are women and only 9 percent of technical roles at Pythian are held by women. Pythian also claims a Pythia Index of 56 percent.

The Pythia Index is the ratio of people in a business or in a team that are women leaders or roll up to a woman leader, as a percentage of total headcount. At Pythian the Pythia Index is 56%, which means that 56% of Pythianites are themselves women leaders or report up the org chart to a woman. Conversely, 44% of Pythian employees have zero women leaders all the way up the org chart from them.

So how do we currently compare to industry leaders? According to the most recent published numbers, Microsoft reported that women comprise 29.1 percent of its workforce, but only 16.6 percent work in technical positions and just 23 percent hold leadership roles. Twitter said women fill 10 percent of its technical jobs, with 21 percent in leadership. And women Googlers account for 17 percent of the search giant’s tech jobs, while only 21 percent manage others. Industry averages among all surveyed major US tech firms are 15.6% of technical roles and 29.6% of leadership.*

With this announcement, I am challenging our business to surpass industry averages within no more than three years, and sooner if possible.

The Pythia Program is central to how we plan to realize the best possible talent mix. By acknowledging and strengthening our deliberate talent blending practices, we hope not only to garner interest from potential employees, but to encourage other businesses to set similar diversity goals. This is not a corporate social responsibility initiative, it’s good for business, and it’s good for the technology sector.

Under the Pythia program we will:

  • Continue to focus on eliminating any unconscious gender biases in our recruiting and management practices.
  • Actively promote Pythian as a diversity-focused company so we can get more than our fair share of the top female leaders and tech talent.
  • Help build our future pipeline of female leaders and tech talent by working with and supporting organizations who have programs to encourage more women to go into tech.

This project is very important to me personally and I’ll continue to blog on the subject in the future.

 

Discover more about the Pythia Program.

 

*Source: cnet.com

Categories: DBA Blogs

High “cursor: pin S wait on X” waits?

Pythian Group - Tue, 2015-11-17 14:59

If your system meets any of the following criteria:

– Oracle 12.1.0.2.0 or higher
– Partitioned tables
– Parallel query heavily used
– Bind variables in use, as they should be

and, you’re seeing unusually high “cursor: pin S wait on X” waits, then you may want to know that this week, patch 21834574 was released.

To give you a little bit of background information; the issue is with a new code path introduced in Oracle version 12, which is related to some of the brand new infrastructure life-cycle management (ILM) functionality. This ILM feature – whether you are intentionally using any ILM functionality or not – causes delays when a new child cursor is created in the library cache. On a client’s system we have observed waits of up to 0.3 seconds.

In a nutshell, here’s what’s happening:

– A new parallelized query using bind variables is hard parsed.
– Oracle creates a new cursor, and adds a child cursor.
– Each of your PX slaves will then wait for the following event: “cursor: pin S wait on X”.
– The mutex being waited on is of type Cursor Pin at the location “kkslce [KKSCHLPIN2]”. This is normal and these waits can’t be fully avoided.

But what’s not normal is that the mutex is being waited on for a quarter of a second in each slave session.

Oracle has now released a patch which implements a new _fix_control which can be set to enable the fix once the patch is applied. As always, please consult with Oracle Support before applying this patch and setting this parameter, to make sure that you really are seeing this particular bug. There are others in the current versions of Oracle which share very similar symptoms, and the only way to be certain is to double-check with Oracle support or development.

Happy patching!

 

Discover more about our expertise in the world of Oracle

Categories: DBA Blogs

IoT, Mobility and Cloud Integration Hands-On: Controlling Raspberry Pi GPIO Using Oracle MAF App and Oracle ADF Application Deployed on Oracle Java Cloud Service (Part I)

Internet of Things ecosystem is continuously changing our world and opening a New Digital Era for business. Billions of different sensors on millions of devices collecting information on everything...

We share our skills to maximize your revenue!
Categories: DBA Blogs