DBA Blogs

Tip of the day: Always put this in your .bashrc

RDBMS Insight - Mon, 2016-05-09 06:52

if you like to scp:

# If not running interactively, don't do anything
[[ $- == *i* ]] || return

Otherwise scp will fail without error – it’s a known bug.

Categories: DBA Blogs

MySQL InnoDB’s Full Text Search overview

Pythian Group - Fri, 2016-05-06 12:56

NOTE: If you want to read and play with the interactive application, please go to the shinnyapps article. It has been developed using Shiny/R in order to allow you to see the effects of the algorithms.

Thanks to Valerie Parham-Thompson at Pythian and Daniel Prince at Oracle.

Github repository contains the code to generate and load the data and also, the Shiny/R code.

Some initial thoughts

A couple of weeks ago one of our customers came up with a question regarding FTS over InnoDB engine. Although the question is not answered in the current article, I came up with the conclusion that FTS is sometimes misunderstood.

The point of this article is to show dynamically how the search algorithms work, using non-fictional data (data sources were downloaded from Gutenberg project within an easy interface (please see at the bottom of the ShinnyApps post here) .

In order to show the effects off the field sizes over the query expansion algorithm, you will see two main tables (bookContent and bookContentByLine) both containing the same books in different approaches: by line and by paragraph. You’ll see the noise generated by the QUERY EXPANSION algorithm when phrases are too large.

For the sake of simplicity, in this article we won’t go through the FTS parsers. That is possible material for a future post.

Why I consider FTS sometimes misunderstood?

FTS is a technology that can be use for any purpose, not only simple searches. Generally, FTS engines are placed to work as a service for web or document searches, which generally require technologies like Solr, ElasticSearch or Sphinx. However, certain bussines rules require complex searches, and having such feature inside RDBMS can be a win.

RDBMS aren’t a good place for massive amount of FTS queries, without using any of the join capabilities that they offer, or the ACID properties.

As I said above, FTS is totally acceptable in RDBMS, if you are using at least one RDBMS main feature, required by your bussines model.

Action!

To start showing the effects of the algorithms, the following example searches the word ‘country’ using query expansion. This means that we are not looking only the exact matches, but also the entries that appear the most when the the exact match has been found.

In the SELECT clause you’ll see both FTS expressions using NATURAL LANGUAGE with query expansion and BOOLEAN modes respectively.

View the code on Gist.

The noise generated by the query expansion is expected and described in the official documentation here.

The interesting case is the following row, which has 2 exact occurrences (you can see the positions 1 and 63) and it is not the highest rank using query extension. Remember, this is expected.


Text: "country districts. As Lucca had five gates, he divided his own country"
bookid: 1232
pos: 1,63
QERank: 80
BoolRank: 14

This is even worse when using large sentences. In the example bellow you will see the same query, against the table storing by paragraph. The boolean rank shows some of the entries way above others, however the query extension locates at the top records that not necessarily has a lot of exact matches.

View the code on Gist.

The query expansion is useful when you intend to search which entries contain more words that appear frequently within the search term. Having large text fields increase the probability to have more words that appear among the search term. In the case of bookContent table (by paragraph table), the average field size is 443.1163 characters.

The INNODB_FT_INDEX_TABLE

There is a way to play with the contents of the FTS indexes. As you may noticed in the previous examples, I used the set global innodb_ft_aux_table = 'ftslab/bookContent'; statement, which loads the index content to memory for an easy querying.

If you use RDS, the option innodb_ft_aux_table is not available as it is GLOBAL and require SUPER privileges.

i.e. You can easily get the most frequent tokens:

View the code on Gist.

We can query the index contents with a simple SQL statement like the following:

View the code on Gist.

In the example shown before the is no intention to compare ranks score as they are based in different algorithms. The idea there is to show that QUERY EXPANSION can have non desire results in some cases due to its mechanism.

Building custom stopwords

It probably isn’t very useful information as most of these words appears too frequently and are modal verbs, adverbs, pronouns, determiners, etc. It could be the case that you are not interested on indexing those words. If that’s the case you can add them as stopwords in your own stopwords table. Specially if you are more interested in boolean searches, loosing some part of the language expressions.

We can build a custom stopwords table based on our current data:

View the code on Gist.

Let’s build our stopwords table using both default and new entries and keeping the alphabetical order:

View the code on Gist.

The idea behind choosing our own stopwords is to measure how much index do we safe filtering those words that are extremely frequent and don’t add a necessary meaning to the search. This topic could be covered in a separate blog post.

Going ahead on choosing stop words

The full article is amazingly interesting. In brief, it says that the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, and so on (rank-frequency distribution is an inverse relation).

Considerations and recommendations

– Use QUERY EXPANSION only if you are interested in searching relations over exact matches. Remember that the field
size is crucial when using this.
– FTS is not the best fit for exact string matches in single columns. You don’t want to use FTS for searching emails in a single column, name and lastname fields , i.e. For those, you’ll probably use other techniques as reverse searches , exact match operator (=) or hashing (CRC32 for emails or large texts smaller than 255 characters).
– Keep your FTS indexes short. Do not add ALL the text columns. Parse first from your application the user search and adapt the query.
– If you are using BOOLEAN MODE, you can use the rank score to filter rows. MySQL is clever enough to optimize the
FTS functions to avoid double executions. You can do this using something like: match(content,title) against ("first (third)") > 1 . Generally, scores lower than 1 can be ignored when using boolean or natural mode searches.
– `OPTIMIZE TABLE` does a rebuild of the table. To avoid this, set innodb_optimize_fulltext_only=1 in order to do an incremental maintance on the table.
– Recall that NATURAL LANGUAGE MODE does not take the operands as the BOOLEAN MODE. This affects the ranking score (try +bad (thing) i.e.)
– If you plan to order by rank, it is not necessary to specify the clause `ORDER BY` as InnoDB does the order after retrieve the doc ids . Also,the behavior is different from the default as it returns the heaviest at the top (like an ORDER BY rank DESC).
– If you come from MyISAM’s FTS implementation, recall that the ranking scoring is different.
– Create the FULLTEXT index after the data is loaded InnoDB Bulk Load. When restoring FTS backups, you will probably hit the “ERROR 182 (HY000) at line nn: Invalid InnoDB FTS Doc ID”.
– Try to avoid using use more than one FTS expression in the where clause. Keep in mind that this affects the order in the results and it consumes a considerably amount of CPU. InnoDB orders by the latest expression in the WHERE clause. WL#7123.
– Also, if avoiding the rank information in the projection (SELECT clause) and using other aggregations like count(*), will use the “no ranking” FT_hints. The LIMIT hint won’t be used if invoked explicitly an ORDER BY and the MATCH clause in the projection.

View the code on Gist.

– If you plan to use FTS_DOC_ID column with AUTO_INCREMENT option, have in mind that there is a limitation regarding this. You must declare a single column PRIMARY KEY constraint or as an UNIQUE index. Also, the data type is stricted as `bigint unsigned`. i.e:

View the code on Gist.

FT_QUERY_EXPANSION_LIMIT

This variable controls the number of top matches when using `WITH QUERY EXPANSION` (affects only MyISAM). Reference.

Bug 80347 – Invalid InnoDB FTS Doc ID


emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m dumpTest < full.dump
ERROR 182 (HY000) at line 73: Invalid InnoDB FTS Doc ID

emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m dumpTest < ddl.dump
emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m dumpTest < onlyData.dump
emanuel@3laptop ~/sandboxes/rsandbox_5_7_9 $ ./m dumpTest < full.dump
ERROR 182 (HY000) at line 73: Invalid InnoDB FTS Doc ID

mysqldump is not very clever if you use `FTS_DOC_ID`:


2016-02-13T22:11:53.125300Z 19 [ERROR] InnoDB: Doc ID 10002 is too big. Its difference with largest used Doc ID 1 cannot exceed or equal to 10000

It takes dumps without considering the restriction coded in `innobase/row/row0mysql.cc`:


Difference between Doc IDs are restricted within
4 bytes integer. See fts_get_encoded_len()

The fix to this is backuping the table by chunks of 10000 documents.

Other useful links

Fine tuning
Performance
Maintenance: innodb_optimize_fulltext_only
Writing FTS parser plugins

Categories: DBA Blogs

Links for 2016-05-05 [del.icio.us]

Categories: DBA Blogs

Partner Webcast – Oracle BI cloud service: Insights at Your Fingertips

As business is constantly changing, we find ourselves wondering… are we prepared for this? In all enterprises, one of the most feared questions is…”What caused this recent dip in sales?”....

We share our skills to maximize your revenue!
Categories: DBA Blogs

Comparing Common Queries Between Test and Production

Bobby Durrett's DBA Blog - Thu, 2016-05-05 13:58

The developers complained that their test database was so much slower than production that they could not use it to really test whether their batch processes would run fast enough when migrated to production. They did not give me any particular queries to check. Instead they said that the system was generally too slow. So, I went through a process to find SQL statements that they had run in test and that normally run in production and compare their run times. I thought that I would document the process that I went through here.

First I found the top 100 queries by elapsed time on both the test and production databases using this query:

column FORCE_MATCHING_SIGNATURE format 99999999999999999999

select FORCE_MATCHING_SIGNATURE from
(select
FORCE_MATCHING_SIGNATURE,
sum(ELAPSED_TIME_DELTA) total_elapsed
from DBA_HIST_SQLSTAT
where 
FORCE_MATCHING_SIGNATURE is not null and
FORCE_MATCHING_SIGNATURE <>0
group by FORCE_MATCHING_SIGNATURE
order by total_elapsed desc)
where rownum < 101;

The output looked like this:

FORCE_MATCHING_SIGNATURE
------------------------
      944718698451269965
     4634961225655610267
    15939251529124125793
    15437049687902878835
     2879196232471320459
    12776764566159396624
    14067042856362022182
...

Then I found the signatures that were in common between the two lists.

insert into test_sigs values (944718698451269965);
insert into test_sigs values (4634961225655610267);
insert into test_sigs values (15939251529124125793);
...
insert into prod_sigs values (3898230136794347827);
insert into prod_sigs values (944718698451269965);
insert into prod_sigs values (11160330134321800286);
...
select * from test_sigs
intersect
select * from prod_sigs;

This led to 32 values of FORCE_MATCHING_SIGNATURE which represented queries that ran on both test and production, except for the possible difference in constants.

Next I looked at the overall performance of these 32 queries in test and production using this query:

create table common_sigs
(FORCE_MATCHING_SIGNATURE number);

insert into common_sigs values (575231776450247964);
insert into common_sigs values (944718698451269965);
insert into common_sigs values (1037345866341698119);
...

select 
sum(executions_delta) total_executions,
sum(ELAPSED_TIME_DELTA)/(sum(executions_delta)*1000),
sum(CPU_TIME_DELTA)/(sum(executions_delta)*1000),
sum(IOWAIT_DELTA)/(sum(executions_delta)*1000),
sum(CLWAIT_DELTA)/(sum(executions_delta)*1000),
sum(APWAIT_DELTA)/(sum(executions_delta)*1000),
sum(CCWAIT_DELTA)/(sum(executions_delta)*1000),
sum(BUFFER_GETS_DELTA)/sum(executions_delta),
sum(DISK_READS_DELTA)/sum(executions_delta),
sum(ROWS_PROCESSED_DELTA)/sum(executions_delta)
from DBA_HIST_SQLSTAT ss,common_sigs cs
where 
ss.FORCE_MATCHING_SIGNATURE = cs.FORCE_MATCHING_SIGNATURE;

Here is part of the output:

TOTAL_EXECUTIONS Elapsed Average ms CPU Average ms IO Average ms
---------------- ------------------ -------------- -------------
         5595295         366.185529      241.92785    59.8682797
          430763         1273.75822     364.258421    1479.83294

The top line is production and the bottom is test.

This result supported the development team’s assertion that test was slower than production. The 32 queries averaged about 3.5 times longer run times in test than in production. Also, the time spent on I/O was about 25 times worse. I am not sure why the I/O time exceeded the elapsed time on test. I guess it has something to do with how Oracle measures I/O time. But clearly on average these 32 queries are much slower on test and I/O time probably caused most of the run time difference.

After noticing this big difference between test and production I decided to get these same sorts of performance metrics for each signature to see if certain ones were worse than others. The query looked like this:

select 
ss.FORCE_MATCHING_SIGNATURE,
sum(executions_delta) total_executions,
sum(ELAPSED_TIME_DELTA)/(sum(executions_delta)*1000),
sum(CPU_TIME_DELTA)/(sum(executions_delta)*1000),
sum(IOWAIT_DELTA)/(sum(executions_delta)*1000),
sum(CLWAIT_DELTA)/(sum(executions_delta)*1000),
sum(APWAIT_DELTA)/(sum(executions_delta)*1000),
sum(CCWAIT_DELTA)/(sum(executions_delta)*1000),
sum(BUFFER_GETS_DELTA)/sum(executions_delta),
sum(DISK_READS_DELTA)/sum(executions_delta),
sum(ROWS_PROCESSED_DELTA)/sum(executions_delta)
from DBA_HIST_SQLSTAT ss,common_sigs cs
where ss.FORCE_MATCHING_SIGNATURE = cs.FORCE_MATCHING_SIGNATURE
having 
sum(executions_delta) > 0
group by
ss.FORCE_MATCHING_SIGNATURE
order by
ss.FORCE_MATCHING_SIGNATURE;

I put together the outputs from running this query on test and production and lined the result up like this:

FORCE_MATCHING_SIGNATURE    PROD Average ms    TEST Average ms
------------------------ ------------------ ------------------
      575231776450247964         20268.6719         16659.4585
      944718698451269965         727534.558          3456111.6 *
     1037345866341698119         6640.87641         8859.53518
     1080231657361448615         3611.37698         4823.62857
     2879196232471320459         95723.5569         739287.601 *
     2895012443099075884         687272.949         724081.946
     3371400666194280661         1532797.66         761762.181
     4156520416999188213         109238.997         213658.722
     4634693999459450255          4923.8897         4720.16455
     5447362809447709021         2875.37308          2659.5754
     5698160695928381586         17139.6304         16559.1932
     6260911340920427003         290069.674         421058.874 *
     7412302135920006997         20039.0452         18951.6357
     7723300319489155163         18045.9756         19573.4784
     9153380962342466451         1661586.53         1530076.01
     9196714121881881832         5.48003488         5.13169472
     9347242065129163091         4360835.92         4581093.93
    11140980711532357629         3042320.88         5048356.99
    11160330134321800286         6868746.78         6160556.38
    12212345436143033196          5189.7972         5031.30811
    12776764566159396624         139150.231         614207.784  *
    12936428121692179551         3563.64537         3436.59365
    13637202277555795727          7360.0632         6410.02772
    14067042856362022182         859.732015         771.041714
    14256464986207527479         51.4042938         48.9237251
    14707568089762185958         627.586095          414.14762
    15001584593434987669         1287629.02         1122151.35
    15437049687902878835         96014.9782         996974.876  *
    16425440090840528197         48013.8912         50799.6184
    16778386062441486289         29459.0089         26845.8327
    17620933630628481201         51199.0511         111785.525  *
    18410003796880256802         581563.611         602866.609

I put an asterisk (*) beside the six queries that were much worse on test than production. I decided to focus on these six to get to the bottom of the reason between the difference. Note that many of the 32 queries ran about the same on test as prod so it really isn’t the case that everything was slow on test.

Now that I had identified the 6 queries I wanted to look at what they were spending their time on including both CPU and wait events. I used the following query to use ASH to get a profile of the time spent by these queries on both databases:

select 
case SESSION_STATE
when 'WAITING' then event
else SESSION_STATE
end TIME_CATEGORY,
(count(*)*10) seconds
from DBA_HIST_ACTIVE_SESS_HISTORY
where 
FORCE_MATCHING_SIGNATURE in
('944718698451269965',
'2879196232471320459',
'6260911340920427003',
'12776764566159396624',
'15437049687902878835',
'17620933630628481201')
group by SESSION_STATE,EVENT
order by seconds desc;

The profile looked like this in test:

TIME_CATEGORY            SECONDS
------------------------ -------
db file parallel read     207450
ON CPU                    141010
db file sequential read    62990
direct path read           36980
direct path read temp      29240
direct path write temp     23110

The profile looked like this in production:

TIME_CATEGORY            SECONDS
------------------------ -------
ON CPU                    433260
PX qref latch              64200
db file parallel read      35730
db file sequential read    14360
direct path read           12750
direct path write temp     12000

So, I/O waits dominate the time on test but not production. Since db file parallel read and db file sequential read were the top I/O waits for these 6 queries I used ash to see which of the 6 spent the most time on these waits.

db file parallel read:

select
  2  sql_id,
  3  (count(*)*10) seconds
  4  from DBA_HIST_ACTIVE_SESS_HISTORY
  5  where
  6  FORCE_MATCHING_SIGNATURE in
  7  ('944718698451269965',
  8  '2879196232471320459',
  9  '6260911340920427003',
 10  '12776764566159396624',
 11  '15437049687902878835',
 12  '17620933630628481201') and
 13  event='db file parallel read'
 14  group by sql_id
 15  order by seconds desc;

SQL_ID           SECONDS
------------- ----------
ak2wk2sjwnd34     159020
95b6t1sp7y40y      37030
brkfcwv1mqsas      11370
7rdc79drfp28a         30

db file sequential read:

select
  2  sql_id,
  3  (count(*)*10) seconds
  4  from DBA_HIST_ACTIVE_SESS_HISTORY
  5  where
  6  FORCE_MATCHING_SIGNATURE in
  7  ('944718698451269965',
  8  '2879196232471320459',
  9  '6260911340920427003',
 10  '12776764566159396624',
 11  '15437049687902878835',
 12  '17620933630628481201') and
 13  event='db file sequential read'
 14  group by sql_id
 15  order by seconds desc;

SQL_ID           SECONDS
------------- ----------
95b6t1sp7y40y      26840
ak2wk2sjwnd34      22550
6h0km9j5bp69t      13300
brkfcwv1mqsas        170
7rdc79drfp28a        130

Two queries stood out at the top waiters on these two events: 95b6t1sp7y40y and ak2wk2sjwnd34. Then I just ran my normal sqlstat query for both sql_ids for both test and production to find out when they last ran. Here is what the query looks like for ak2wk2sjwnd34:

select ss.sql_id,
ss.plan_hash_value,
sn.END_INTERVAL_TIME,
ss.executions_delta,
ELAPSED_TIME_DELTA/(executions_delta*1000) "Elapsed Average ms",
CPU_TIME_DELTA/(executions_delta*1000) "CPU Average ms",
IOWAIT_DELTA/(executions_delta*1000) "IO Average ms",
CLWAIT_DELTA/(executions_delta*1000) "Cluster Average ms",
APWAIT_DELTA/(executions_delta*1000) "Application Average ms",
CCWAIT_DELTA/(executions_delta*1000) "Concurrency Average ms",
BUFFER_GETS_DELTA/executions_delta "Average buffer gets",
DISK_READS_DELTA/executions_delta "Average disk reads",
ROWS_PROCESSED_DELTA/executions_delta "Average rows processed"
from DBA_HIST_SQLSTAT ss,DBA_HIST_SNAPSHOT sn
where ss.sql_id = 'ak2wk2sjwnd34'
and ss.snap_id=sn.snap_id
and executions_delta > 0
and ss.INSTANCE_NUMBER=sn.INSTANCE_NUMBER
order by ss.snap_id,ss.sql_id;

I found two time periods where both of these queries were recently run on both test and production and got an AWR report for each time period to compare them.

Here are a couple of pieces of the AWR report for the test database:

testtop5

testsqlelapsed

Here are similar pieces for the production database:

top5 foreground elapsed

What really stood out to me was that the wait events were so different. In production the db file parallel read waits averaged around 1 millisecond and the db file sequential reads averaged under 1 ms. On test they were 26 and 5 milliseconds, respectively. The elapsed times for sql_ids 95b6t1sp7y40y and ak2wk2sjwnd34 were considerably longer in test.

This is as far as my investigation went. I know that the slowdown is most pronounced on the two queries and I know that their I/O waits correspond to the two wait events. I am still trying to find a way to bring the I/O times down on our test database so that it more closely matches production. But at least I have a more narrow focus with the two top queries and the two wait events.

Bobby

Categories: DBA Blogs

InnoDB flushing and Linux I/O

Pythian Group - Thu, 2016-05-05 12:06

Since documentation is not very clear to me on the topic of InnoDB flushing in combination with Linux IO (specifically the write system call), I decided to put together this article in hopes of shedding some light on the matter.

How Linux does I/O

By default, the write() system call returns after all data has been copied from the user space file descriptor into the kernel space buffers. There is no guarantee that data has actually reached the physical storage.

The fsync() call is our friend here. This will block and return only after the data and metadata (e.g. file size, last update time) is completely transferred to the actual physical storage.

There is also fdatasync() which only guarantees the data portion will be transferred, so it should be faster.

There are a few options that we can specify at file open time, that modify the behaviour of write():

O_SYNC

In this case, the write() system call will still write data to kernel space buffers, but it will block until the data is actually transferred from the kernel space buffers to the physical storage. There is no need to call fsync() after.

O_DIRECT

This completely bypasses any kernel space buffers, but requires that the writes are the same size as the underlying storage block size (usually 512 bytes or 4k). By itself, it does not guarantee that the data is completely transferred to the device when the call returns.

O_SYNC + O_DIRECT

As stated above, we would need to use both options together guarantee true synchronous IO.

Relation with InnoDB flushing

Innodb_flush_method parameter controls which options will be used by MySQL when writing to files:

At the time of this writing, we have the following options:

NULL

This is the default value, and is equivalent to specifying fsync option.

fsync

Both data and redo log files will be opened without any special options, and fsync() will be used when the db needs to make sure the data is transferred to the underlying storage.

O_DSYNC

This one is confusing, as O_DSYNC us actually replaced with O_SYNC within the source code before calling open(). It is mentioned this is due to some problems on certain Unix versions. So O_SYNC will be used to open the log files, and no special options for the datafiles. This means fsync() needs to be used to flush the data files only.

O_DIRECT

Data files are opened with O_DIRECT. Log files are opened with no extra options. Some filesystems (e.g. XFS) do not guarantee metadata without the fsync() call, so it is still used as safety measure.

O_DIRECT_NO_FSYNC

InnoDB uses O_DIRECT during flushing I/O, but skips the fsync() system call afterwards. This can provide some performance benefits if you are using a filesystem that does not require the fsync() to sync metadata.

I am deliberately not mentioning the experimental options littlesync and nosync.

There is also an extra option in Percona Server:

ALL_O_DIRECT

It uses O_DIRECT to open the log files and data files and uses fsync() to flush both the data and the log files.

Which InnoDB flushing method should I use?

The general consensus if you have a battery backed write cache or fast IO subsystem (e.g. SSD’s) is to use the O_DIRECT method. However it is a better practice to run tests to determine which method provides a better performance for each particular environment.

 

One downside of using O_DIRECT is that it requires the innodb-buffer-pool-size to be configured correctly. For example, if you accidentally leave your buffer pool size at the default value of 128M, but have 16G of RAM, the buffer pool contents will at least sit in the filesystem cache. This will not be true if you have O_DIRECT enabled (I would like to thank Morgan Tocker for his contribution regarding this section of the post).

 

 

Categories: DBA Blogs

Log Buffer #472: A Carnival of the Vanities for DBAs

Pythian Group - Thu, 2016-05-05 09:14

This Log Buffer Edition takes into account blog posts from Oracle, SQL Server and MySQL.

Oracle:

Enterprise Manager Support Files 101- The EMOMS files

From time to time we see a complaint on OTN about the stats history tables being the largest objects in the SYSAUX tablespace and growing very quickly.

Delphix replication and push button cloud migration

PS360: A Utility to Extract and Present PeopleSoft Configuration and Performance Data

Contemplating Upgrading to OBIEE 12c?

SQL Server:

Modifying the SQL Server Model System Database to Customize New Database Settings

Azure SQL Database Elastic Database Jobs

SQL Server Resource Governor

Add a Custom Index in Master Data Services 2016

Unified Approach to Generating Documentation for PowerShell Cmdlets

MySQL:

Writing SQL that works on PostgreSQL, MySQL and SQLite

MariaDB MaxScale 1.4.2 GA is available for download

MariaDB ColumnStore, a new beginning

Planets9s – Watch the replay: Become a MongoDB DBA (if you’re re really a MySQL user)

Upgrading to MySQL 5.7, focusing on temporal types

Categories: DBA Blogs

Compression -- 6b : Advanced Index Compression (revisited)

Hemant K Chitale - Thu, 2016-05-05 09:09
Following up on my earlier post on 12.1.0.2 Advanced Index Compression, one of my readers asked what would be the difference if I reversed the order of columns in the chosen index.

My defined index was on (OWNER, OBJECT_TYPE, OBJECT_NAME) --- defined as being from the column with the fewest distinct values to the most.  This ordering is best compressible with Index Key Compression (also known as Prefix Compression).  If I reverse the order, Index Key Compression for the two leading columns wouldn't deliver the same level of compression.  The question is whether Advanced Index Compression can intelligently handle the reversal.

SQL> create index target_data_ndx_3_comp on
2 target_data(object_name, object_type, owner) compress 2;

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_3_COMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_3_COMP'
4 /

LEAF_BLOCKS
-----------
3091

SQL>


Surprisingly, this index with Prefix 2 on (OBJECT_NAME, OBJECT_TYPE) is, at 3,091 leaf blocks, smaller than the previous  index with Prefix 2 on (OWNER, OBJECT_TYPE) at 5,508 leaf blocks.

Continuing with Prefix 3

SQL> drop index target_data_ndx_3_comp;

Index dropped.

SQL> create index target_data_ndx_3_comp on
2 target_data(object_name, object_type, owner) compress 3;

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_3_COMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_3_COMP'
4 /

LEAF_BLOCKS
-----------
2277

SQL>


At 2,277 leaf blocks it is, as expected, the same size with Prefix 3 on (OWNER, OBJECT_TYPE, OBJECT_NAME).  Since the entire index key is specified as the Prefix, both indexes would be the same size.

Going on to Advanced Index Compression

SQL> drop index target_data_ndx_3_comp;

Index dropped.

SQL> create index target_data_ndx_4_advcomp on
2 target_data(object_name, object_type, owner)
3 compress advanced low
4 /

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_4_ADVCOMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_4_ADVCOMP'
4 /

LEAF_BLOCKS
-----------
2277

SQL>


This is, again, as expected.  Advanced Index Compression results in the same size irrespective of the ordering of the columns.

The advantage of Advanced Index Compression over Key or Prefix Compression is that the DBA does not need to determine the Prefix for compression.  He does not have to spend time to analyze the data and compare the number of distinct values for each of the columns in the composite index.
.
.
.

Categories: DBA Blogs

How to Deal with MetaData Lock

Pythian Group - Thu, 2016-05-05 08:59
What is MetaData Lock?

MySQL uses metadata locking to manage concurrent access to database objects, and to ensure data consistency when performing modifications to the schema: DDL operations. Metadata locking applies not just to tables, but also to schemas and stored programs (procedures, functions, triggers, and scheduled events).

In this post I am going to cover metadata locks on tables and triggers, that are usually seen by DBAs during regular operations/maintenance.

Kindly refer to these 4 different connections to MySQL Instance:

Screen Shot 2016-04-19 at 2.58.52 pm

 

The screenshot shows that the uncommitted transaction may cause metadata lock to ALTER operations. The ALTER will not proceed until the transaction is committed or rolled-back. What is worse, after the ALTER is issued, any queries to that table (even simple SELECT queries) will be blocked. If the ALTER operation is an ONLINE DDL operation available in 5.6+, queries will proceed as soon as the ALTER begins.

Refer to this video tutorial on MySQL Metadata Locks for further context.

These days we have a “DBAs” favourite tool “pt-online-schema-change” (osc). Let’s have a look what will happen If we run osc instead of ALTER.

Screen Shot 2016-04-19 at 3.07.26 pm

OSC gets stuck at metadata lock at the point of creating triggers on table.

Let’s jump on the second topic how can we mitigate MDL issues:

Mitigating the MetaData Lock Issues

There are various solutions to tackling MDL:

  1. Appropriate setting of wait_timeout variable which will kill stuck/sleep threads after a certain time.
  2. Configure pt-kill to get rid of stuck/sleep threads  
  3. Fix code where transactions are not committed after performing DB queries
How to kill Sleep Connections in RDS which are causing MDL

If you are on RDS and your MySQL is having bunch of Sleep threads and you don’t know which connection is causing metadata lock, then you have to kill all the Sleep queries which are in mysql for more than a certain time. As we know “kill thread_id” is not permitted in RDS, but you can use the query below to get the exact queries to kill Sleep threads.

Example Output:

mysql> SELECT CONCAT('CALL mysql.rds_kil ( ',id,')',';') FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND='Sleep' AND TIME > 10 ;
+---------------------------------------------+
| CONCAT('CALL mysql.rds_kill ( ',id,')',';') |
+---------------------------------------------+
| CALL mysql.rds_kill ( 5740758); |
| CALL mysql.rds_kill ( 5740802); |
| CALL mysql.rds_kill ( 5740745); |
| CALL mysql.rds_kill ( 5740612); |
| CALL mysql.rds_kill ( 5740824); |
| CALL mysql.rds_kill ( 5740636); |
| CALL mysql.rds_kill ( 5740793); |
| CALL mysql.rds_kill ( 5740825); |
| CALL mysql.rds_kill ( 5740796); |
| CALL mysql.rds_kill ( 5740794); |
| CALL mysql.rds_kill ( 5740759); |
| CALL mysql.rds_kill ( 5740678); |
| CALL mysql.rds_kill ( 5740688); |
| CALL mysql.rds_kill ( 5740817); |
| CALL mysql.rds_kill ( 5740735); |
| CALL mysql.rds_kill ( 5740818); |
| CALL mysql.rds_kill ( 5740831); |
| CALL mysql.rds_kill ( 5740795); |
| CALL mysql.rds_kill ( 4926163); |
| CALL mysql.rds_kill ( 5740742); |
| CALL mysql.rds_kill ( 5740797); |
| CALL mysql.rds_kill ( 5740832); |
| CALL mysql.rds_kill ( 5740751); |
| CALL mysql.rds_kill ( 5740760); |
| CALL mysql.rds_kill ( 5740752); |
| CALL mysql.rds_kill ( 5740833); |
| CALL mysql.rds_kill ( 5740753); |
| CALL mysql.rds_kill ( 5740722); |
| CALL mysql.rds_kill ( 5740723); |
| CALL mysql.rds_kill ( 5740724); |
| CALL mysql.rds_kill ( 5740772); |
| CALL mysql.rds_kill ( 5740743); |
| CALL mysql.rds_kill ( 5740744); |
| CALL mysql.rds_kill ( 5740823); |
| CALL mysql.rds_kill ( 5740761); |
| CALL mysql.rds_kill ( 5740828); |
| CALL mysql.rds_kill ( 5740762); |
| CALL mysql.rds_kill ( 5740763); |
| CALL mysql.rds_kill ( 5740764); |
| CALL mysql.rds_kill ( 5740773); |
| CALL mysql.rds_kill ( 5740769); |
| CALL mysql.rds_kill ( 5740770); |
| CALL mysql.rds_kill ( 5740771); |
| CALL mysql.rds_kill ( 5740774); |
| CALL mysql.rds_kill ( 5740784); |
| CALL mysql.rds_kill ( 5740789); |
| CALL mysql.rds_kill ( 5740790); |
| CALL mysql.rds_kill ( 5740791); |
| CALL mysql.rds_kill ( 5740799); |
| CALL mysql.rds_kill ( 5740800); |
| CALL mysql.rds_kill ( 5740801); |
| CALL mysql.rds_kill ( 5740587); |
| CALL mysql.rds_kill ( 5740660); |
+---------------------------------------------+
53 rows in set (0.02 sec)
  1. Capture sql queries to kill Sleep threads

mysql -htest-server.us-west-2.rds.amazonaws.com. –skip-column-names -e ‘SELECT CONCAT(“CALL mysql.rds_kill ( “,id,”)”,”;”) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND=”Sleep” AND TIME > 10’ > kill_sleep_threads.sql

2.Execute queries from mysql prompt

mysql -htest-server.us-west-2.rds.amazonaws.com.

mysql> source kill_sleep_threads.sql
Improvements in MySQL 5.7 related to MDL

Generally, we would want to kill as few connections as possible. But the trouble with metadata locks prior to 5.7 is that there is no insight available into which threads are taking the metadata lock. In MySQL 5.7, there are several improvements in getting insight into metadata lock information.

The Performance Schema now exposes metadata lock information:

  • Locks that have been granted (shows which sessions own which current metadata locks)
  • Locks that have been requested but not yet granted (shows which sessions are waiting for which metadata locks).
  • Lock requests that have been killed by the deadlock detector or timed out and are waiting for the requesting session’s lock request to be discarded

This information enables you to understand metadata lock dependencies between sessions. You can see not only which lock a session is waiting for, but which session currently holds that lock.

The Performance Schema now also exposes table lock information that shows which table handles the server has open, how they are locked, and by which sessions.

To check who holds the metadata lock in MySQL 5.7, We have to enable global_instrumentation and wait/lock/metadata/sql/mdl.

Below is the example to enable global_instrumentation and wait/lock/metadata/sql/mdl

mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'global_instrumentation';

Query OK, 0 rows affected (0.00 sec)

Rows matched: 1  Changed: 0  Warnings: 0

mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';

Query OK, 1 row affected (0.00 sec)

Rows matched: 1  Changed: 1  Warnings: 0

Once global_instrumentation and wait/lock/metadata/sql/mdl are enable, below query will show the locks status on connections.

 

mysql> SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS, THREAD_ID, PROCESSLIST_ID, PROCESSLIST_INFO FROM performance_schema.metadata_locks INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID WHERE PROCESSLIST_ID <> CONNECTION_ID();
+-------------+---------------+-------------+---------------------+-------------+-----------+----------------+------------------------------------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | LOCK_TYPE | LOCK_STATUS | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_INFO |
+-------------+---------------+-------------+---------------------+-------------+-----------+----------------+------------------------------------------+
| TABLE | sbtest | sbtest1 | SHARED_READ | GRANTED | 29 | 4 | NULL |
| GLOBAL | NULL | NULL | INTENTION_EXCLUSIVE | GRANTED | 30 | 5 | alter table sbtest1 add key idx_pad(pad) |
| SCHEMA | sbtest | NULL | INTENTION_EXCLUSIVE | GRANTED | 30 | 5 | alter table sbtest1 add key idx_pad(pad) |
| TABLE | sbtest | sbtest1 | SHARED_UPGRADABLE | GRANTED | 30 | 5 | alter table sbtest1 add key idx_pad(pad) |
| TABLE | sbtest | sbtest1 | EXCLUSIVE | PENDING | 30 | 5 | alter table sbtest1 add key idx_pad(pad) |
| TABLE | sbtest | sbtest1 | SHARED_READ | PENDING | 31 | 6 | select count(*) from sbtest1 |
+-------------+---------------+-------------+---------------------+-------------+-----------+----------------+------------------------------------------+
6 rows in set (0.00 sec)

Here PROCESSLIST_ID 4 is GRANTED and other PROCESSLIST_IDs are in PENDING state.

Conclusion

Best-practice when running any DDL operation, even with performance schema changes in 5.7, it to make sure to check processlist for presence of MDL waits, check SHOW ENGINE INNODB STATUS for long active transactions. Kill DDL operation while resolving the MDL issue so as to prevent query pileup. For a temporary fix implement pt-kill or wait_timeout. Review and fix application code/scripts for any uncommitted transactions to solve metadata lock issue.

Categories: DBA Blogs

Partner Webcast - Oracle Database Cloud Service: Cloud Migration Options

Oracle’s Public Cloud leverages existing technologies and best practices to deliver Oracle Database Technologies utilizing Oracle’s secure and optimized cloud-computing platform. At cloud.oracle.com...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Transparent Data Encryption for SQL Server in an Availability Group

Pythian Group - Tue, 2016-05-03 13:24

With the all new features in SQL Server 2016 always on, which you can read up on here, it’s easy to forget about Transparent Data Encryption (TDE). This blog post will focus on TDE.

TDE encrypts database files at rest. What this means is your .MDF and .NDF Files, and consequently your backups, will be encrypted, meaning you will not be able to detach the database files and restore them on another server unless that server has the same certificate that was used to encrypt the database.

In this blog post I am using SQL Server 2014 and will explain how to enable TDE on an existing AG Group database

  1. The first thing we need to check is if the server has a master encryption key on all replica in the AG Group
USE MASTER
GO
SELECT * FROM sys.symmetric_keys
WHERE name = '##MS_DatabaseMasterKey##'

The Screenshot below shows I don’t have a key so I need to create one

No Master Encryption Key

  1. Create a Database Master Encryption Key on each of the replicas in the AG Group. It is important to use a complex password

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'C&amp;mpl£xP@$$Wrd'
GO

  1. Run the code in step 1 and this time you should see the below

Master Encryption Key

  1. Now we need to create a certificate to use for the encryption of the database on the primary replica. This can be accomplished by using the below

CREATE CERTIFICATE BackupEncryptionCert
WITH SUBJECT = 'SQL Server 2014 AdventureWorks2012 Encryption Certificate';
GO

  1. Validate the Certificate

SELECT name, pvt_key_encryption_type_desc, thumbprint FROM sys.certificates

Validate Encryption Key

The thumbprint will be useful because when a database is encrypted, it will indicate the thumbprint of the certificate used to encrypt the Database Encryption Key.  A single certificate can be used to encrypt more than one Database Encryption Key, but there can also be many certificates on a server, so the thumbprint will identify which server certificate is needed

  1. Next We need to backup the certificate on the Primary Replica

BACKUP CERTIFICATE BackupEncryptionCert
TO FILE = ‘C:\BackupCertificates\BackupEncryptionCert.bak’
WITH PRIVATE KEY ( FILE = ‘C:\BackupCertificates\BackupEncryptionCertKey.bak’ ,
ENCRYPTION BY PASSWORD = ‘Certi%yC&amp;mpl£xP@$$Wrd’)

Encryption Files

The BACKUP CERTIFICATE command will create two files.  The first file is the server certificate itself.  The second file is a “private key” file, protected by a password. Both files and the password will be used to restore the certificate onto other instances.

  1. The Files created in step 6 needs to be copied to each of the other replicas and created in SQL Server. After the files are copied the below command can be used to create the certificates

CREATE CERTIFICATE BackupEncryptionCert
FROM FILE = ‘C:\BackupCertificates\BackupEncryptionCert.bak’
WITH PRIVATE KEY (FILE = ‘C:\BackupCertificates\BackupEncryptionCertKey.bak’,
DECRYPTION BY PASSWORD = ‘Certi%yC&amp;mpl£xP@$$Wrd’);

  1. That’s all the configuration needed for each instance now we are ready to start encrypting the database. We now need to tell SQL Server which Encryption Type we want to use and which certificate to use. This can be done using the following code on the Primary Replica

Use Adventureworks2012
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE BackupEncryptionCert

  1. Finally, the last step is to enable TDE by executing the below command on the Primary Replica

ALTER DATABASE AdventureWorks2012 SET ENCRYPTION ON

 

And that’s it, I hope you enjoyed this tutorial and found it informative. If you have any questions, please comment below.

Categories: DBA Blogs

Compression -- 6 : Advanced Index Compression

Hemant K Chitale - Tue, 2016-05-03 09:23
Earlier, I had covered Index (Key) Compression which is included in the Enterprise Edition.

In Key Compression, the DBA must specify the Prefix (i.e. number of leading columns in a composite index) that must be used as the compression key.

12.1.0.2 Advanced Index Compression does not require the DBA to manually identify the prefix key length.  Advanced Index Compression auto(magically) identifies the optimal prefix columns.
(Note : Advanced Index Compression requires the Advanced Compression licence Option and is available only in 12.1.0.2 and higher)

SQL> create table target_data as select * from source_data where 1=2;

Table created.

SQL> insert /*+ APPEND */ into target_data select * from source_data;

367156 rows created.

SQL> commit;

Commit complete.

SQL> insert /*+ APPEND */ into target_data select * from source_data;

367156 rows created.

SQL> commit;

Commit complete.

SQL> insert /*+ APPEND */ into target_data select * from source_data;

367156 rows created.

SQL> commit;

Commit complete.

SQL>


Creating a Key Compression Index by specifying the Prefix size (the leading 2 columns) :

SQL> create index target_data_ndx_1_comp on
2 target_data (owner, object_type, object_name) compress 2;

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_1_COMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_1_COMP'
4 /

LEAF_BLOCKS
-----------
5508

SQL>


Note how I specified "2" as the Prefix size as I want to compress on repeated values of (OWNER, OBJECT_NAME).

Using Advanced Index Compression by specifying "COMPRESS ADVANCED LOW" and letting Oracle decide on the compression strategy in each leaf block :

SQL> drop index target_data_ndx_1_comp;

Index dropped.

SQL> create index target_data_ndx_2_advcomp on
2 target_data (owner, object_type, object_name)
3 compress advanced low;

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_2_ADVCOMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_2_ADVCOMP'
4 /

LEAF_BLOCKS
-----------
2277

SQL>


Wow, that's significantly smaller.  What's more, I did not have to spend time analyzing the data and the index definition to identify the "correct" Prefix size.

However, it is now possible to specify the entire composite key as the Prefix, although that is not what I would have done in earlier versions.  Identifying the Prefix size requires analyzing the data.

SQL> create index target_data_ndx_1_comp on
2 target_data (owner, object_type, object_name) compress 3
3 /

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_1_COMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_1_COMP'
4 /

LEAF_BLOCKS
-----------
2277

SQL>


So, the Advanced option allows me to let Oracle automatically decide the appropriate mechanism to compress the Index keys.

UPDATE : Also see the subsequent test with a reversal of the columns in the composite index.

Just for comparison, here is a regular index :

SQL> drop index target_data_ndx_2_advcomp;

Index dropped.

SQL> create index target_data_ndx_3_nocomp on
2 target_data (owner, object_type, object_name)
3 /

Index created.

SQL> exec dbms_stats.gather_index_stats('','TARGET_DATA_NDX_3_NOCOMP');

PL/SQL procedure successfully completed.

SQL> select leaf_blocks
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_3_NOCOMP'
4 /

LEAF_BLOCKS
-----------
7289

SQL>


That is a much larger regular index !
.
.
.

Categories: DBA Blogs

Links for 2016-05-02 [del.icio.us]

Categories: DBA Blogs

SQL vs. Excel – Subgroup medians

RDBMS Insight - Mon, 2016-05-02 19:34

Recently I ran across this post on how to do subgroup medians in Excel 2010. First you need to create a pivot table, then “do some copying and pasting and use a formula to make it happen”. In SQL you can do this with one command.

Suppose that you have the same table as the Excel article, something like this:

CREATE TABLE sampletab
(arrest_day_of_week varchar2(10), 
arrest_ts TIMESTAMP, 
fingerprint_ts TIMESTAMP, 
days_between NUMBER);

and you want to get the median value of days_between for each day of the week.

The steps in Excel apparently go like this:

  1. Create pivot table to get the means
  2. Copy and paste the column values from the pivot table
  3. For Sunday, create an IF formula to include a cell’s days_between in the median calculation only if the arrest_day_of_week for that row is Sunday
  4. Repeat for other six days

Ouch!

In SQL, it’s one simple statement:

SELECT arrest_day_of_week, median(days_between) AS median_days_between FROM sampletab
GROUP BY arrest_day_of_week;

Conclusion – if you’re into data analysis, SQL can be a big time-saver!

Categories: DBA Blogs

Reserved words usage in MySQL

Pythian Group - Mon, 2016-05-02 15:07

It is not uncommon to come across MySQL databases where reserved words are in use as identifiers for any kind of database objects.

Perhaps when the application schema was implemented, the words were not reserved yet, and they became reserved later on a subsequent MySQL release.

It is a good practice to check reserved words usage prior to doing any database upgrades, as any newly reserved keywords will cause syntax errors on the new version.

This is usually not a problem if proper quoting is used for referencing the objects, as described on the official manual page.

The actual steps to do this depend on the environment; for example, the following can be configured to tell Hibernate to escape identifiers:

property name="hibernate.globally_quoted_identifiers" value="true"

This does not appear to be documented properly (there is an open bug unresolved at the time of this writing).

However, we cannot make the assumption that all application code is properly escaped to deal with this kind of issues.

So what are the symptoms?

Error 1064 will be reported while trying to use a reserved word:

mysql> CREATE TABLE interval (begin INT, end INT);
ERROR 1064 (42000): You have an error in your SQL syntax ...
near 'interval (begin INT, end INT)'
How can we check for reserved words?

The following procedure can help you find out if any particular MySQL version’s reserved words are in use:

  1. Using the list on the corresponding manual page, create a text file with one reserved word on each line
  2. Load data into a temporary table
     USE test;
    CREATE TABLE reserved_words VARCHAR(50); 
    LOAD DATA INFILE 'reserved_words.txt' INTO TABLE test.reserved_words;
    
  3. Check for any column names using reserved keywords
    SELECT table_schema, table_name, column_name, ordinal_position 
    FROM information_schema.columns
    WHERE table_schema NOT IN ( 'mysql', 'information_schema', 'performance_schema' ) 
    AND column_name = ANY ( SELECT * FROM test.reserved_words ) 
    ORDER BY 1,2,4;
  4. Check for any table names using reserved keywords
    SELECT table_schema, table_name
    FROM information_schema.tables
    WHERE table_schema NOT IN ( 'mysql', 'information_schema', 'performance_schema' ) 
    AND table_name = ANY ( SELECT * FROM test.reserved_words );
  5. Check for any procedures or functions
    SELECT routine_schema, routine_name, routine_type
    FROM information_schema.routines
    WHERE routine_schema NOT IN ( 'mysql', 'information_schema', 'performance_schema' )
    AND routine_name = ANY ( select * from test.words );

I hope this post helps you avoid one of the many issues you may encounter during the challenging task of database upgrades.

Categories: DBA Blogs

#GoldenGate #Cloud Service (#GGCS) … what to expect?

DBASolved - Mon, 2016-05-02 07:30

As I sit here working on performing some GoldenGate migrations to AWS for a client, I’ve been thinking about the glimpse of GoldenGate Cloud Service (GGCS) that was provided to me earlier this week. That glimpse has helped me define what and how GGCS is going to work within the Oracle Cloud space. Ever since this service was announced back at Oracle Open World 2015, I’ve been wanting to get my hands on this cloud product from Oracle to just better understand it. Hopefully, what I’m about to share with you will provide some insight into what to expect.

First, you will need a cloud account. If you do not have a cloud account; visit http://cloud.oracle.com and sign up for an account. This will typically be the same account you use to login to My Oracle Support (MOS).

Once you have an account and are in the cloud interface, subscribe to some services. You will need a Database Cloud Service or an Compute Cloud Service. These services will be the end points for the GGCS to point to. As part of setting up the compute node, you will need to setup SSH access with a public/private key. Once you create the GGCS instance, the same public/private key should be use to keep everything simple.

Once GGCS is made available for trial, currently it is only available through the sales team, many of us will have the opportunity to play with this. The following screen captures and comments were taken from the interface I had access to while discussing GGCS with Oracle Product Management.

Like any of the other cloud services from Oracle, once you have access to GGCS it will appear in your dashboard as available cloud services. In the figure below, GGCS is listed at the top of the services that I had access to. You will notice over on the right, there is a link called “service console”.

When you click on the service console link, you are taken to the console that is specific to GGCS. On the left hand side of the console, you will see three primary areas. The “overview” area is the important one; it provides you with all the information needed about your GGCS environment. You will see the host and port number, what version of GGCS you are running and the status of your environment.

With the environment up and running, you will want to create a new GGCS instance. This instance is created under your cloud service console. On this screen you are given information that tells you how many instances you have running with the number of OCPUs, Memory and storage for the configuration along with the public IP address. Notice the button to the right, just below Public IPs, this is the button that allows you to create a new GGCS instance. In the figure below, the instance has already been created.

Drilling down into the instance, you are taken to a page that illustrates your application nodes for GGCS. Notice that the GGCS instance actually created a compute node VM to run GoldenGate from.

With everything configured from the Oracle Cloud interface, you can now access the cloud server using the details provided (do not have current screen shots of this). Once you access the cloud server, you will find that Oracle GoldenGate has been configured for you along with a TNS entry that points to a “target” location. These items are standard template items for you to build your GoldenGate environment from. The interesting thing about this configuration is that Oracle is providing a single virtual machine (compute node) that will handle all the apply process to a database (compute node).

With the GGCS service running, you are then ready to build out your GoldenGate environment.

Like many other GoldenGate architectures, you build out the source side of the architecture like anything else. You install the GoldenGate software, build an extract, trail files and a data pump. The data pump process is then pointed to the GoldenGate Cloud Service (GGCS) instance instead of the target instance. The local trail files will be shipped to the GGCS machine. Once on the GGCS instance, the replicat would need to be configured. Part of the configuration of the replicat at this point is updating the TNSNames.ora file to point to the correct “target” compute node/database instance. The below picture illustrates this concept.

You will notice that the GGCS is setup to be an intermediary point in the cloud. This allows you to be flexible with your GoldenGate architecture in the cloud. From a single GGCS service you can run multiple replicats that can point to multiple difference cloud compute nodes; turning your GGCS into a hub that can send data to multiple cloud resources.

In talking with the Oracle Product team about GGCS, the only downside to GGCS right now is that it cannot be used for bi-directional setup or pulling data from the cloud. In essence, this is a uni-direction setup that can help you move from on-premise to cloud with minimal configuration setup needed.

Well, this is my take on GGCS as of right now. Once GGCS trials are available, I’ll try to update this post or add more posts on this topic. Until then, hope you have gain a bit of information this topic and looking forward to using GGCS.

Enjoy!!

about.me: http://about.me/dbasolved


Filed under: Cloud, Golden Gate
Categories: DBA Blogs

FBDA -- 7 : Maintaining Partitioned Source Table

Hemant K Chitale - Mon, 2016-05-02 02:46
Taking up the TEST_FBDA_PARTITIONED table,  let's look at a couple of Partition Maintenance operations.

SQL> select partition_name, high_value, num_rows
2 from user_tab_partitions
3 where table_name = 'TEST_FBDA_PARTITIONED'
4 order by partition_position
5 /

PARTITION_NAME HIGH_VALUE NUM_ROWS
---------------- ------------------------- ----------
P_100 101 100
P_200 201 100
P_300 301 100
P_400 401 100
P_MAX MAXVALUE 301

SQL>


Let's try a TRUNCATE PARTITION

SQL> alter table test_fbda_partitioned truncate partition p_100;

Table truncated.

SQL>


So, that's supported.

Let's try a SPLIT PARTTIION

SQL> alter table test_fbda_partitioned       
2 split partition p_max at (501)
3 into (partition p_500, partition p_max)
4 /
alter table test_fbda_partitioned
*
ERROR at line 1:
ORA-55610: Invalid DDL statement on history-tracked table


SQL>


So, a SPLIT PARTITION fails.  We need to DISASSOCIATE the Flashback Archive.

SQL> execute dbms_flashback_archive.disassociate_fba('HEMANT','TEST_FBDA_PARTITIONED');

PL/SQL procedure successfully completed.

SQL> select table_name, flashback_archive_name, archive_table_name, status
2 from user_flashback_archive_tables
3 where table_name = 'TEST_FBDA_PARTITIONED'
4 /

TABLE_NAME
--------------------------------------------------------------------------------
FLASHBACK_ARCHIVE_NAME
--------------------------------------------------------------------------------
ARCHIVE_TABLE_NAME STATUS
----------------------------------------------------- -------------
TEST_FBDA_PARTITIONED
FBDA
SYS_FBA_HIST_93342 DISASSOCIATED


SQL>
SQL> alter table test_fbda_partitioned
2 split partition p_max at (501)
3 into (partition p_500, partition p_max)
4 /

Table altered.

SQL> execute dbms_flashback_archive.reassociate_fba('HEMANT','TEST_FBDA_PARTITIONED');

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, flashback_archive_name, archive_table_name, status
2 from user_flashback_archive_tables
3 where table_name = 'TEST_FBDA_PARTITIONED'
4 /

TABLE_NAME
--------------------------------------------------------------------------------
FLASHBACK_ARCHIVE_NAME
--------------------------------------------------------------------------------
ARCHIVE_TABLE_NAME STATUS
----------------------------------------------------- -------------
TEST_FBDA_PARTITIONED
FBDA
SYS_FBA_HIST_93342 ENABLED


SQL>


While a Table is disassociated with it's Flashback Archive, DDL that would not normally be permitted may be done under strict control to ensure that there is no data divergence.

.
.
.
Categories: DBA Blogs

Partition Storage -- 7 : Revisiting HWM - 2 (again)

Hemant K Chitale - Mon, 2016-05-02 02:19
Revisiting the previous test case, but with a larger AVG_ROW_LEN

SQL> create table part_table_large
(id_column number(6), data_col_1 varchar2(100), data_col_2 varchar2(100))
partition by range (id_column)
(partition p_100 values less than (101),
partition p_200 values less than (201),
partition p_300 values less than (301),
partition p_400 values less than (401),
partition p_max values less than (maxvalue))
/
2 3 4 5 6 7 8 9

Table created.

SQL>
SQL> insert into part_table_large values
(51,rpad('String',75,'X'), rpad('Another',60,'Y'))
2 3
SQL> /

1 row created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> declare
cntr number;
begin
cntr := 0;
while cntr < 100000
loop
insert into part_table_large
values (25, rpad('String',75,'X'), rpad('Another',60,'Y'));
commit;
cntr := cntr + 1;
end loop;
end;
2 3 4 5 6 7 8 9 10 11 12 13
14 /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL>
SQL> declare
cntr number;
begin
cntr := 0;
while cntr < 500001
loop
insert into part_table_large
values (45, rpad('String',75,'X'), rpad('Another',60,'Y'));
commit;
cntr := cntr + 1;
end loop;
end;
2 3 4 5 6 7 8 9 10 11 12 13
14 /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL>
SQL> declare
cntr number;
begin
cntr := 0;
while cntr < 500001
loop
insert into part_table_large
values (55, rpad('String',75,'X'), rpad('Another',60,'Y'));
commit;
cntr := cntr + 1;
end loop;
end;
2 3 4 5 6 7 8 9 10 11 12 13
14 /

PL/SQL procedure successfully completed.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats('','PART_TABLE_LARGE',granularity=>'ALL');

PL/SQL procedure successfully completed.

SQL>
SQL> select avg_row_len, num_rows, blocks
from user_tab_partitions
where table_name = 'PART_TABLE_LARGE'
and partition_name = 'P_100'
/
2 3 4 5
AVG_ROW_LEN NUM_ROWS BLOCKS
----------- ---------- ----------
140 1100003 22349

SQL>
SQL>
SQL> alter table part_table_large move partition p_100 ;

Table altered.

SQL>
SQL> exec dbms_stats.gather_table_stats('','PART_TABLE_LARGE',granularity=>'ALL');

PL/SQL procedure successfully completed.

SQL>
SQL> select avg_row_len, num_rows, blocks
from user_tab_partitions
where table_name = 'PART_TABLE_LARGE'
and partition_name = 'P_100'
/
2 3 4 5
AVG_ROW_LEN NUM_ROWS BLOCKS
----------- ---------- ----------
140 1100003 22626

SQL>
SQL>
SQL> select extent_id, blocks
from dba_extents
where segment_name = 'PART_TABLE_LARGE'
and segment_type = 'TABLE PARTITION'
and partition_name = 'P_100'
and owner = 'HEMANT'
order by 1
/
2 3 4 5 6 7 8
EXTENT_ID BLOCKS
---------- ----------
0 1024
1 1024
2 1024
3 1024
4 1024
5 1024
6 1024
7 1024
8 1024
9 1024
10 1024
11 512
12 1024
13 1024
14 1024
15 1024
16 1024
17 1024
18 1024
19 1024
20 1024
21 1024
22 1024

23 rows selected.

SQL>


Aha ! Unlike the previous case (where an AVG_ROW_LEN of 11, a MOVE reduced the HWM from 3,022 to 2,484), with a larger row size, the HWM has moved from 22,349 to 22,626.

So, space consumption is a factor of both the AVG_ROW_LEN and the manner in which the rows are  inserted / relocated.

SQL> l
1 select avg_row_len*num_rows*1.2/8192 Expected_Blocks, Blocks
2 from user_tab_partitions
3 where table_name = 'PART_TABLE_LARGE'
4* and partition_name = 'P_100'
SQL> /

EXPECTED_BLOCKS BLOCKS
--------------- ----------
22558.6553 22626

SQL>

Also, see how the "Expected Blocks" count seems more accurate than earlier.
.
.
.



Categories: DBA Blogs

Speakers: Put your Twitter Handle on the Windows taskbar!

The Oracle Instructor - Sat, 2016-04-30 04:56

If you speak often at conferences, sharing your screen to demo things, this could be helpful:

Twitter Handle on the Windows taskbar

Throughout your presentation, the audience will be able to see your Twitter Handle, reminding them to include it with tweets about the event. I used to include it in the slides, but this is better, because it works also with live demonstrations where no slides are being showed. Which is incidentally my favorite way to do presentations:-)

Now how can you do it? Quite easy, you open the Windows Control Panel and click on Region and Language. Then click on Additional settings:

Region and Language 1

Then you insert your Twitter Handle (or any other text you like to see on the taskbar) as AM and PM symbol. Make sure to select Time formats with trailing tt:

Region and Language 2

That’s it. If you want the font size as large as on the first picture above, that can be done here:

twitterhandel_taskbar4

I did that with Windows 7 Professional 64 bit. Hope you find it useful:-)


Tagged: speaker tip
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator - DBA Blogs