Skip navigation.

Yann Neuhaus

Syndicate content
All blog entries from http://www.dbi-services.com/
Updated: 8 hours 36 min ago

EMC World Las Vegas – Momentum 2015 second day

Tue, 2015-05-05 17:05

For second day of conferences, I attended a first session about migration from Webtop to D2 or xCP. Then we attended 2 about Documentum platform performances tuning.

 

We could see EMC actually putting efforts to take benefits of Open Sources software. They started to package whole platform component by component into Dockers containers. Hope is to try to simplify upgrades of them from one version to another.

 

They also invited audience to migrate to Documentum 7.1/7.2 because lot of performances enhancements were done, especially for multi-core CPU support and better session pooling management, which last topic took us some maintenance time last months.

 

Key advantage of migration from Webtop to D2 or xCP is they capability to integrate a lot of business case scenarios out of the box. For instance, when a customer wants to move customizations into D2, by experience they said we could reach up to 50% features coverage by software configuration instead of coding. A great saving of time and money as well as maintenance costs over the time and upon further upgrades.

 

Finally they also stated EMC is providing few tools to ease processes of migration for former live science appliance to actual one and webtop to D2 and xCP.

 

For Documentum platform performances tuning, I invite you to read Gérard's blog following this link.

 

I hope you enjoyed reading summary of today at EMC world – Momemtum 2015. Thanks for your attention.

EMC World 2015 - Day 2 at Momentum

Tue, 2015-05-05 16:53

Second day in this amazing event. There are not only general and presentation sessions you can also participate on a so called "Hands-on Lab". The subject was the "EMC Documentum Platform Performance Tuning". So learning by doing is also a good opportunity you can use at Momentum to enhance your skills.


The session covered the performance tuning using Fiddler for the HTTP requests, the DFC trace, sql traces and how to use the related execution plan, and at the end, how to tune xPlore but only for the ingestion phase meaning the indexing of the documents.The tuning of the fulltext search was not addressed.

Most of the tips I learned was on the xPlore side, which parameters to set to increase the performances or to avoid some errors due timeouts for instance. I skipped more or less the Database tuning, why? that's not acceptable would you say. Because we have the experts at dbi service to do this kind of tuning!

So let's me give you some information.

DFC trace

In df.properties add :

dfc.tracing.enable=true

dfc.tracing.verbose=true

dfc.tracing.max_stack_deph=0

dfc.tracing.include_rpcs=true

dfc.tracing.mode=compact

dfc.tracing.include_session_id=true

dfc.tracing=c:\dctm\trace


dfc.tracing.enable can be set from false to true and after a couple of seconds the trace file will be created. Of course once the value is set to false, the tracing is stopped.

xPlore tuning

I would recommend to rely to the documentation but here you can find some tips:
- disable fulltext for a specific format by setting can_index=false
- filter which document is not indexed by excluding cabinets or folder or even document types
- reduce the online rebuild index max queue size
- reduce the number of indexing threads and CPS request threads
- reduce the number of documents processed simultaneously by the CPS internal processing queue
- increase the number of CPS daemon processes

WARNING: each parameter can have some drawbacks, so take care when you change this values

So this was on a lab, let's apply this on real case at customer sites!

SQL Server vNext becomes SQL Server 2016

Tue, 2015-05-05 15:33

 

Satya Nadella, CEO of Microsoft has announced at the Microsoft Ignite keynote in Chicago that SQL Server 2016 public preview will come this summer and you can be sure that at dbi services we’re looking forward to take a look at under the hood and test this next major release.

Amid all the good news, here those who have focus my attention:

 

blog_41_-_1_-_always_encrypted

 

First of all, because we’re working in Switzerland, security is a main concern at some customer places and security features like Always Encrypted arouses all of our interest. Indeed, a big improvement here concerns the protection of sensitive data performed at rest and also in motion. In addition, the encryption key is not anymore stored in the SQL Server instance and now resides with the application in the customer environment in a secure location. Microsoft will also provide support for row level security and dynamic data masking. The latter will probably focus the eyes of some of our customers. I Think we will hear about these new features the next few months ...

 

blog_41_-_2_-_stretch_tables

 

Another interesting feature is certainly stretch tables but in fact this is not really a news because it was firstly announced during the previous Pass summit. This is clearly a hyper-scale cloud feature, mainly focused on hybrid scenarios. You can think this feature as a kind of partitioning table where historical data (cold data) are dynamically stretched in the cloud.

 

blog_41_-_3_-_inmemory_analytics

 

When we talked about SQL Server 2014 we automatically think about hekaton. Is that will be the same with SQL Server 2016 and new real-time operational Analytics & In-Memory OLTP feature? At the moment it is too early to speculate but let’s say that this feature will interest a lot of folks including the dbi Microsoft team :-)   The idea seems to be pretty simple: combining the existing OLTP In-memory tables feature and the in-memory columnstore capabilities to obtain a satisfying mix that will increase the scope of possible scenarios (OLTP and analytics). This feature needs to be tested for sure!

 

 

blog_41_-_4_-aags

 

Another good news that concerns high-availability topic and especially availability groups which will now support MsDtc and SSIS. AGs will able to use native load balancing for reporting workload against the secondaries. In addition, it will be possible to use 3 synchronous replicas and we will be able to configure a failover policy based on database health.

 

alt

Finally, you will note the integration of Polybase into SQL Server (previously Polybase was available only with AFS) to extended access to both unstructured and structured data by using T-SQL language. Thus, we come full circle: transaction processing, data warehousing, BI activities and now big data capabilities handled by the same engine.

You can download the SQL Server 2016 datasheet here.

Probably other good news will come until the final release of SQL Server 2016 and we will have the opportunity to blog on them. So stay connected!

 


When tempdb database may cause indirect issues

Tue, 2015-05-05 13:23

A couple of weeks ago, I had an interesting discussion with one of my friend that faced a weird issue with a SQL Server instance that lacked worker threads. Because I can’t use his own information I decided to reproduce the same issue in order to share with you some interesting information. So the next part of this blog post refers exclusively to my own test but it represents exactly the real issue encountered by my friend.

Let’s set the scene: in the SQL Server error log we found the following records related to our issue:

 

2015-04-16 14:06:16.54 Server     New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 94%.

2015-04-16 14:11:16.74 Server     New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 600 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 92%.

 

In the same time, we found an event related to the generation of a SQL Server dump file triggered by the deadlock schedulers monitor as follows:

 

2015-04-16 14:06:04.66 Server     **Dump thread - spid = 0, EC = 0x0000000000000000 2015-04-16 14:06:04.83 Server     ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Log 2015-04-16 14:06:04.83 Server     * ******************************************************************************* 2015-04-16 14:06:04.83 Server     * 2015-04-16 14:06:04.83 Server     * BEGIN STACK DUMP: 2015-04-16 14:06:04.83 Server     *   16/04/15 14:06:04 spid 6392 2015-04-16 14:06:04.83 Server     * 2015-04-16 14:06:04.83 Server     * Deadlocked Schedulers 2015-04-16 14:06:04.83 Server     * 2015-04-16 14:06:04.83 Server     * ******************************************************************************* 2015-03-27 01:05:02.83 Server     * ------------------------------------------------------------------------------- 2015-03-27 01:05:02.83 Server     * Short Stack Dump

 

I began my investigation by having a first look at the SQL Server error log file but I didn’t find any relevant information. So, I decided to move on the analysis of the dump file.

My first idea was to look at the runaway threads at the moment of the dump file generation (!runaway command). In fact, I suspected first a high-CPU workload because I didn’t get other information from my friend. Here what I found:

 

blog_39_-_1_-_runaway

 

I began my investigation by having a first look at the SQL Server error log file but I didn’t find any relevant information. So, I decided to move on the analysis of the dump file.

My first idea was to look at the runaway threads at the moment of the dump file generation (!runaway command). In fact, I suspected first a high-CPU workload because I didn’t get other information from my friend. Here what I found:

 

blog_39_-_2_-_nb_of_running_threads

 

557 threads in use at the moment of the generation of the dump file. According to the Microsoft documentation here and the concerned SQL Server infrastructure (4 logical processors and max worker thread option = 0), the number of running threads are greater than maximum of configured SQL Server threads equal to 512 in this case.

 

SQL Server detected 2 sockets with 2 cores per socket and 2 logical processors per socket, 4 total logical processors;

 

We found a lot of thread patterns that seem blocked by a database lock.

 

blog_39_-_3_-_call_stacks

 

Note in this case that SQL Server has decided to switch off the “tied” thread but it is still waiting to be signaled that the database lock is available to continue …

Another piece of interesting call stack is the following:

 

blog_39_-_4_-_call_stacks_2

 

 

This above call stack seems to concern the SQL Server start-up process. Let’s take a look at the functions like sqlmin!DBMgr::OpenUnstartedDatabase, sqlmin!DBMgr::StartupDB or sqlmin!IMEDUtil::SynchronzeWithResourceDB .We can also notice the use of the function sqlmin!StartUp::WaitForTempdbReady that seems to be related to tempdb database readiness. After that, the concerned thread is also waiting on the database lock to be available and we don’t see any functions that state tempdb is ready at the moment of the dump file generation.

 

Well, to summarize we are facing a general locking issue inside SQL Server process that has consumed all of available threads and we may think that tempdb readiness has something to do with this our blocking issue.

So now, let’s go back to the SQL Server error log and let’s have a deeper look at the SQL Server error log event. At this point, I remembered the famous Bob Ward’s session at Summit 2011 about tempdb database in which he explained the importance of tempdb database during the SQL Server start-up process because even if SQL Server is ready for user connections. Indeed, they have to wait until tempdb has really started and it makes sense because we can wonder what’s going on if tempdb database cannot start correctly. Do we allow user connections to perform their work? As a reminder, the unavailability of tempdb is considered as a SQL Server shutdown event.

In my context we found effectively that the tempdb database was ready only 40 minutes after SQL Server was ready for user connections as follows:

 

2015-04-16 13:52:38.43 spid5s     Starting up database 'master'. 2015-04-16 13:53:05.76 spid9s     Starting up database 'model'. 2015-04-16 13:53:25.02 spid9s     Clearing tempdb database. 2015-04-16 13:51:48.13 spid13s     SQL Server is now ready for client connections. This is an informational message; no user action is required. 2015-04-16 14:32:12.28 spid9s     Starting up database 'tempdb'. 2015-04-16 14:32:15.71 spid9s     Recovery completed for database tempdb (database ID 2) in 2 second(s)

 

The $100 dollar question: Why tempdb started so slowly? Well, I didn’t have all information but my friend told me that they had a storage maintenance at the time … maybe we may correlate these two events together but without any further investigation, it is not possible to respond correctly about it.

Anyway it was very interesting to see how the tempdb database may trigger an escalade of issues in this case. There are probably a lot of ways to reproduce this issue.

Finally let me finish my blog post by explaining my test scenario. In fact to simulate the same issue than my friend, I created first a very big user table inside the model database to considerably slow down the start-up of my tempdb database. Then I configured tempdb with big database files in size without instant file initialization (zeroing is mandatory in this case for all of database files). In the same time, I used ostress tool to simulate roughly 600 running threads. Finally I let the magic happens ...

I hope this blog will help you in your database administrator daily work.

Understand the Lifecycle of Alfresco Nodes

Tue, 2015-05-05 12:19


I guess you all already know what a lifecycle is. We born, we live and we die... In fact, it's exactly the same for Aflresco Nodes! Well, at least from an end user point of view. But what is really going on behind that? This is what I will try to explain in this post.

First of all what is an Alfresco Node? For most people, a Node is just a document stored in Alfresco but in reality it's much more than that: everything in Alfresco is a Node! A Node has a type that defines its properties and it also have some associations with other Nodes. This is a very short and simplified description but we don't need to understand what exactly a Node is to understand the lifecycle process. So as said above, Alfresco Nodes have their own lifecycle but it's a little bit more complicated than just three simple steps.

Please note that in this post, I will use $ALF_HOME as a reference to the location where alfresco has been installed (e.g. /opt/alfresco-4.2.c) and $ALF_DATA as a reference to the alf_data location. By default the alf_data folder is $ALF_HOME/alf_data/.


I. Creation


In this post I will use a document as an Alfresco Node to easily understand the process. So this lifecycle start with the creation of a new document named "Test_Lifecycle.docx" with the following creation date: Febrary the 1st, 2015 at 16:45.

When a document is created in Alfresco, three things are done:

  • File System: the content of this file is stored on the Alfresco "Content Store". The Content Store is by default under $ALF_DATA/contentstore. This file is actually put somewhere under this folder that depends on the creation time and an ID is given to this file. For our file, it would be: $ALF_DATA/contentstore/2015/2/1/16/45/408a6980-237e-4315-88cd-6955053787c3.bin.
  •  Database: the medatada of this file are stored on the Alfresco Database. In fact in the DB, this document is mainly referenced using its NodeRef or NodeID. This NodeRef is something we can see on the Alfresco Web Interface from the document details page (web preview of a document): http://HOSTNAME:PORT/share/page/document-details?nodeRef=workspace://SpacesStore/09a8bd9f-0246-47a8-9701-29436c7d29a6. Please be aware that the NodeRef contains an UUID but it's not the same that the ID on the Content Store side... Moreover, there is a property in the DB that link the NodeRef to the Content Store's ID for Alfresco to be able to retrieve the content of a file.
  • Index: an index is created for this file in the Search engine (can be Lucene for older versions of Alfresco or Solr for newer versions). This index is in the "workspace" store.



II. Update, Review, Approve, Publish, aso...


Once the document is created, his life really begins. You can update/review/approve/publish it manually or automatically with different processes. All these actions are part of the active life of a document. From an administration point of view, there isn't that much to say here.


III. Deletion - User level

 
When a document isn't needed anymore, for any reason, a user with sufficient permissions is able to delete it. For our example, let's say that a user deleted our file "Test_Lifecycle.docx" using the Alfresco Share Web Interface on Febrary the 20th, 2015 at 15:30 (19 days after creation). When using the Web Interface or Web Services to delete a document, the "nodeService.deleteNode" method is called. So what happened to our "three things"?

  • FS: nothing changed on the Content Store. The file content is still here.
  • DB: on the DB side, the NodeRef changed from workspace://SpacesStore/09a8bd9f-0246-47a8-9701-29436c7d29a6 to archive://SpacesStore/09a8bd9f-0246-47a8-9701-29436c7d29a6 (the "store_id" field changed on the "alf_node" table).
  • Index: same thing for the search index: the index is moved from the "workspace" store to the "archive" store.


Actually, when a user deletes a document from a Web Interface of Alfresco, the document is just moved to a "global trashcan". By default all users have access to this global trashcan in Alfresco Explorer to restore the documents they may have deleted by mistake. Of course they can't see all documents but only the ones related to them. On Alfresco Share, the access to this global trashcan is configured in the share-config.xml file and by default on most versions, only administrators have access to it.

The only way to avoid this global trashcan is to programmatically delete the document by applying the aspect "cm:temporary" to the document and then call the "nodeService.deleteNode" on it. In that way, the document is removed from the UI and isn't put in the global trashcan.


 IV. Deletion - Administration level

 
What I describe here as an "Administration level" is the second level of deletion that happen by default. This level is the deletion of the document from the global trashcan. If the document is still in the global trashcan, Administrators (or users if you are using Alfresco Explorer) can still restore the document. If the document is "un-deleted", then it will return exactly where it was before and of course metadata & index of this document will be moved from the "archive" store to the "workspace" store to return in an active life.

On April the 1st, 2015 at 08:05 (40 days after deletion at user level), an administrator decides to remove "Test_Lifecycle.docx" from the global trashcan. This can be done manually or programmatically. Moreover, there are also some existing add-ons that can be configured to automatically delete elements in the trashcan older than XX days. This time, the "NodeArchiveService.purgeArchiveNode" method is called (archiveService.purge in some older Alfresco versions). So what happened to our "three things" this time?

  • FS: still nothing changed on the Content Store. The file content is still here.
  • DB: on the DB side, the document is still there but some references/fields (not all) are removed. All references on the "alf_content_data" table are removed when only some fields are emptied on the "alf_node" table. For Alfresco 4.0 and below, the "node_deleted" field on the table "alf_node" is changed from 0 to 1. On newer versions of Alfresco, the "node_deleted" doesn't exist anymore but the QNAME of the node (field "type_qname_id") on the "alf_node" table is changed from 51 ("content") to 140 ("deleted"). So the Node is now deleted from the global trashcan and Alfresco knows that this node can now be safely deleted but this will not be done now... Once the "node_deleted" or "type_qname_id" is set, the "orphan_time" field on the "alf_content_url" table for this document is also changed from NULL to the current unix timestamp (+ gmt offset). In our case it will be orphan_time=1427875500540.
  • Index: the search index for this Node is removed.


As you can see, there are still some remaining elements on the File System and in the DB. That's why there is a last step in our lifecycle...


V. Deletion - One more step.


As you saw before, the document "Test_Lifecycle.docx" is now considered as an "orphaned" Node. On Alfresco, by default, all orphaned Nodes are protected for 14 days. That means that during this period, the orphaned Nodes will NOT be touched at all. Of course this value can be changed easily on Alfresco configuration files. So what happen after 14 days? Well in fact every day at 4am (again, by default... can be changed), a scheduled job (the "contentStoreCleaner") scan Alfresco for orphaned Nodes older than 14 days. Therefore on April the 15th, 2015 at 04:00, the scheduled job runs and here is what it does:

  • FS: the content file is moved from $ALF_DATA/contentstore/ to $ALF_DATA/contentstore.deleted/
  • DB: on the DB side, the document is still here but the line related to this document on the "alf_content_url" table (this table contains the orphan_time and reference to the FS location) is removed.
  • Index: nothing to do, the search index was already removed.


You can avoid this step where documents are put on the "contentstore.deleted" folder by setting "system.content.eagerOrphanCleanup=true" in the alfresco-global.properties configuration file. If you do so, after 14 days, the document on the File System is not moved but will be deleted instead.


VI. Deletion - Still one more step...!


As said before, there are still some references to the "Test_Lifecycle.docx" document on the Alfresco Database (especially on the "alf_node" table). Another scheduled job, the nodeServiceCleanup runs every day at 21:00 to clean everything that is related to Nodes that has been deleted (orphaned nodes) for more than 30 days. So here is the result:

  • FS: the content file is still on the $ALF_DATA/contentstore.deleted/ folder
  • DB: the DB is finally clean!
  • Index: nothing to do, the search index was already removed.



VII. Deletion - Oh you must be kidding me!?


So many steps, isn't it! As saw before, the only remaining thing to do is to remove the content file from the $AL_DATA/contentstore.deleted/ folder. You probably think that there is also a job that do that for you after XX days but it's not the case, there is nothing in Alfresco that deletes the content file from this location. In consequences, if you want to clean the File System, you will have to do it by yourself.

On Unix for example, you can simply create a crontab entry:

50 23 * * * $ALF_HOME/scripts/cleanContentStoreDeleted.sh


Then create this file with the following content:

#!/bin/sh

CS_DELETED=$ALF_DATA/contentstore.deleted/

# Remove all files from contentstore.deleted older than 30 days
find $CS_DELETED -type f -mtime +30 | xargs rm 2> /dev/null

# Remove all empty folders from contentstore.deleted older than 60 days
find $CS_DELETED -type d -mtime +60 -empty | xargs rm -r 2> /dev/null


Please be aware that you should be sure that the folder "$ALF_DATA/contentstore.deleted" exist... And when I say that, I mean YOU MUST ABSOLUTELY BE SURE that it exists. Please also never remove anything under the "contentstore" folder and never remove the "contentstore.deleted" folder itself!

You may wonder why the DB isn't cleaned automatically when the trashcan is cleaned and why the file content is also kept 14 days... Well I can assure you that there are several reasons and the principal one is for backup/restore performance concerns. I will not explain it in details but basically as the file content isn't touched for 14 days by default, that means that you can restore your database up to 14 days in the past and your database will still be consistent with the File System without to restore the FS! Of course if you just do that you will lose the documents uploaded/changed in the last 14 days because your database wasn't aware of these files 14 days ago. But you can just backup/restore the content files created in the last 14 days with an incremental backup.

E.g.: Today (05-May-2015), I want to backup the FS (only the last 14 days), then I will have to backup all folders inside $ALF_DATA/contentstore/2015/5 AND I will also have to backup all folders inside $ALF_DATA/contentstore/2015/4 with a folder name bigger or equal than 30 (days in Apr) + 5 (days in May) - 14 = 21.


I hope this post was clear enough because it's true that it can be hard to understand everything regarding the lifecycle of Alfresco Node and to deal with it properly.

In-Memory OLTP: Hash and Range indexes

Tue, 2015-05-05 02:04

Since SQL Server 2014 CTP2, Microsoft has introduced a new kind of index which is the Range index. We have now two ways for indexing an In-Memory table: using either an Hash index or a Range index.

These two indexes are slightly different.
In fact, a Hash index is a set of buckets, 8-bytes memory pointers, which points to the actual row data or row chain data. SQL Server 2014 uses a hash function to map values to buckets. To minimize Hash collisions the number of bucket should be twice the number of index values. A Hash collision occurs when the Hash function returns the same bucket value for two different data value for example “Audi” and “Mercedes”. Hash indexes are optimized for equality predicates.

In contrast, for Range indexes, the leaf level of a Bw-Tree gives you a pointer to the first data row in the row chain. Like with hash indexes, rows with the same indexed value form a row chain. This kind of index is particularly efficient with inequality predicates like ‹ or › .If you want to know more about Range Indexes, please read the excellent blog of my colleague David Barbarin.

To compare these two kind of indexes I will create a new database with first two tables, one In-Memory table with a hash index on a DateTime column and another one (disk-based table) with a non-clustered index on the same column. I will fill these tables with one millions records. Here is the script I used:

 

-- database creation

use master

go


if exists(select * from sys.databases where name = 'Test_InMemoryOLTP_Optimized')

   drop database Test_InMemoryOLTP_Optimized

go


create database Test_InMemoryOLTP_Optimized

go


--create a disk table

use Test_InMemoryOLTP_Optimized

go


if exists(select * from sys.objects where name = 'Person_OnDisk')

   drop table dbo.Person_OnDisk

go


create table dbo.Person_OnDisk

(

Person_OnDisk_ID int not null primary key,

Person_OnDisk_Name nvarchar(200) not null,

Person_OnDisk_Date datetime not null,

index Person_OnDisk_Person_OnDisk_Date nonclustered (Person_OnDisk_Date)

)


--enable database for memory optimized tables

-- add memory_optimized_data filegroup

alter database Test_InMemoryOLTP_Optimized

   add filegroup Test_InMemory_mod contains MEMORY_OPTIMIZED_DATA

go


--add container to the filegroup

alter database Test_InMemoryOLTP_Optimized

   add file (name='InMemory_mod', filename='d:\InMemory\InMemory_mod')

   to filegroup Test_InMemory_mod

go


use Test_InMemoryOLTP_Optimized

go


if exists(select * from sys.objects where name = 'Person_InMemory')

   drop table dbo.Person_InMemory

go


create table dbo.Person_InMemory

(

Person_InMemory_ID int not null primary key

       nonclustered hash with (bucket_count = 2000000),

Person_InMemory_Name nvarchar(200) COLLATE Latin1_General_100_BIN2 NOT NULL,

Person_InMemory_Date datetime not null index Person_InMemory_Person_InMemory_Date

nonclustered hash (Person_InMemory_Date) with (bucket_count = 2000000)

)

with (memory_optimized = on, durability = schema_and_data)


--fill in disk table

set nocount on

go

begin tran

declare @cpt int = 0

While @cpt < 1000000

begin

   insert into dbo.Person_OnDisk values

       (@cpt, 'Name_'+cast(@cpt as varchar(10)), dateadd(dd,round(@cpt/10000,0,1),'2014-01-01'))

       set @cpt += 1

end

commit


--fill in In-Memory table

set nocount on

go

begin tran

declare @cpt int = 0

While @cpt < 1000000

begin

   insert into dbo.Person_InMemory values

       (@cpt, 'Name_'+cast(@cpt as varchar(10)), dateadd(dd,round(@cpt/10000,0,1),'2014-01-01'))

       set @cpt += 1

end

commit

 

First remark, filling in the In-Memory table is three time faster than for disk-based table, respectively on my machine 5 seconds for the former against 15 seconds for the latter. You can also see that I decided to persist my In-Memory table data as I used the durability option schema_and_data instead of schema_only which means that after a restart of my SQL Server service the data will not be persistent, so definitively lost.

 

Now, I will try to search for records where date is equal to 14-01-2014 and then I will compare results between my In-Memory and my disk-based tables. I will use IO and Time statistics options to get a better view of each query performance:

 

-- use equality operator with disk table and In-Memory table with Hash index

USE Test_InMemoryOLTP_Optimized

GO

 

SET NOCOUNT ON

SET STATISTICS IO ON

SET STATISTICS TIME ON

 

SELECT *

FROM dbo.Person_OnDisk

WHERE Person_OnDisk_Date = '2014-01-14'

 

SELECT *

FROM dbo.Person_InMemory

WHERE Person_InMemory_Date = '2014-01-14'

 

The result of those queries is:

 

Table on disk:

Table 'Person_OnDisk'. Scan count 1, logical reads 5812, physical reads 3, read-ahead reads 5805, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 

SQL Server Execution Times:

   CPU time = 31 ms, elapsed time = 1631 ms.

 

 

Table In-Memory:

SQL Server Execution Times:

   CPU time = 0 ms, elapsed time = 128 ms.

 

The disk table made multiple logical reads (5812), three physical reads and 5805 read-aheads whereas the In_Memory table as it is in memory did not perform any disk I/O’s activity.
Concerning the execution time, my disk-based table consumed 31 ms of CPU against and ran in 1631 ms against no CPU and 128 ms for my In-Memory table...
In-Memory table is definitively faster than my disk-based table.

 

Let see now if I used an inequality predicate. So I am using the same queries with an ›:

 

SET NOCOUNT ON

SET STATISTICS IO ON

SET STATISTICS TIME ON

 

SELECT *

FROM dbo.Person_OnDisk

WHERE Person_OnDisk_Date > '2014-01-14' and Person_OnDisk_Date '2014-02-20'

 

 

SELECT *

FROM dbo.Person_InMemory

WHERE Person_InMemory_Date > '2014-01-14' and Person_InMemory_Date '2014-02-20'

 

 

The result is:

 

Disk table:

Table 'Person_OnDisk'. Scan count 1, logical reads 5812, physical reads 3, read-ahead reads 5805, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 

SQL Server Execution Times:

   CPU time = 171 ms, elapsed time = 3150 ms.

 

 

In-Memory table:

SQL Server Execution Times:

CPU time = 204 ms, elapsed time = 2153 ms.

 

I/O’s for both tables did not change, but concerning the In-Memory table we saw that the gain is not very interesting compare to equality predicate see before. The Hash index is not tuned for inequality predicate. I will now change the Hash index of my In-Memory table to a Range index. To do it, I will have to drop and recreate my table, as it is not possible to alter an In-Memory table, and to reload data.

 

--Change the Hash index by a Range index

if exists(select * from sys.objects where name = 'Person_InMemory')

   drop table dbo.Person_InMemory

go

 

create table dbo.Person_InMemory

(

Person_InMemory_ID int not null primary key

       nonclustered hash with (bucket_count = 2000000),

Person_InMemory_Name nvarchar(200) COLLATE Latin1_General_100_BIN2 NOT NULL,

Person_InMemory_Date datetime not null index Person_InMemory_Person_InMemory_Date

nonclustered (Person_InMemory_Date)

)

with (memory_optimized = on, durability = schema_and_data)

 

 

In-Memory table:

 

SQL Server Execution Times:

   CPU time = 47 ms, elapsed time = 2155 ms.

 

The CPU time is now more than four time lower with a Range Index compare to a Hash index.
If I take a look at the execution plan I can see the optimizer is using my range index defined in my script with an index seek operation against a table scan when using an hash index with an equality operator.

 

Before creating an In-Memory table remember that you will have to think about which kind of predicates will be used in queries to choose the right index: Hash or Range or even both. In addition, don’t forget that an In-Memory table accepts a maximum of eight indexes. Finally, if you need to create another index for an In-Memory table you will have to drop and recreate the table.
I hope this blog will help. If you want to know more about SQL Server 2014 In-Memory OLTP please joins us to our next Event in June, details and inscription here.

Variations on 1M rows insert (3): TimesTen

Mon, 2015-05-04 23:00

In the previous blog post I measured how inserting 1 million rows takes 5 minutes instead of 45 seconds when we commit at each row inserted. In Oracle we can improve that down to 2 minutes if we can accept (and manage) the loss of transactions following an instance crash.
Let's go further. For OLTP that need to face a high rate of transactions, Oracle has TimesTen which is a Database In-Memory - meaning that it is optimized for in-memory structures. My testcase has a 1 million rows table that can fit in memory, so let's try it.

Test case

I am still using the Oracle Developer Day 11g VM which has TimesTen installed.

I've configured my database in /home/oracle/app/oracle/product/TimesTen/tt1122/info/sys.odbc.ini

[sampledb_1122]
Driver=/home/oracle/app/oracle/product/TimesTen/tt1122/lib/libtten.so
DataStore=/home/oracle/app/oracle/product/TimesTen/tt1122/info/DemoDataStore/sampledb_1122
PermSize=400
TempSize=320
PLSQL=1
DatabaseCharacterSet=US7ASCII
I used the sample one but I have increased the size of memory to 400MB and 320MB for PERM_ALLOCATED_SIZE and TEMP_ALLOCATED_SIZE.

ttIsql

So I run the TimesTen command line interpreter:

[oracle@localhost ~]$ ttisql

Copyright (c) 1996-2011, Oracle.  All rights reserved.
Type ? or "help" for help, type "exit" to quit ttIsql.
check the version
Command> version
TimesTen Release 11.2.2.2.0
connect to my database
Command> connect "dsn=sampledb_1122";
Connection successful: DSN=sampledb_1122;UID=oracle;DataStore=/home/oracle/app/oracle/product/TimesTen/tt1122/info/DemoDataStore/sampledb_1122;DatabaseCharacterSet=US7ASCII;ConnectionCharacterSet=US7ASCII;DRIVER=/home/oracle/app/oracle/product/TimesTen/tt1122/lib/libtten.so;PermSize=40;TempSize=32;TypeMode=0;
(Default setting AutoCommit=1)
and I want to manage commits myself
Command> set autocommit 0;
and then I'm doing the same as in previous posts:
Command> set echo on;

Command> create table DEMO ("id" number primary key, "text" varchar2(15), "number" number);

Command> set timing on;
There is something that is very nice about TimesTen: it supports PL/SQL. I'm running exactly the same as I did in Oracle:
Command> declare
 type people_array is varray(12) of varchar(15);
 people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
 people_count number :=people.COUNT;
 n number;
begin
 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
  commit;
 end loop;
 commit;
end;
/

PL/SQL procedure successfully completed.

Execution time (SQLExecute) = 21.010554 seconds.
21 second. Yes that's fast...

Durable commits

Do you remember how we improved the commits in the last post? When accepting the risk to have non durable commits the response time was faster on Oracle database because we don't have to wait for persistence of commits.

Here in TimesTen, it's an In-Memory database and by default transactions are not durable. Transactions are logged for rollback, but not for recovery. Which means that nothing goes to disk. If the instance crashes, data is lost. If I want to have the durability of ACID properties then we have to write transaction logging to disk.
This is activated by the following property in the sys.odbc.ini:

DurableCommits=1;

I connect:
Command> connect "dsn=sampledb_1122";
Connection successful: DSN=sampledb_1122;UID=oracle;DataStore=/home/oracle/app/oracle/product/TimesTen/tt1122/info/DemoDataStore/sampledb_1122;DatabaseCharacterSet=US7ASCII;ConnectionCharacterSet=US7ASCII;DurableCommits=1;DRIVER=/home/oracle/app/oracle/product/TimesTen/tt1122/lib/libtten.so;PermSize=40;TempSize=32;TypeMode=0;
(Default setting AutoCommit=1)
and run exactly the same as before:
PL/SQL procedure successfully completed.

Execution time (SQLExecute) = 31.316902 seconds.
It's longer but still faster than the Oracle database. TimesTen is optimized for that.

Conclusion

Note that we have the durability here, thanks to the persistence of transaction log, as long as we set DurableCommits=1. However, in case of instance crash, the recovery may be long because lot of transaction must be re-done. It's an In-Memory database, it's fast when it's up but you don't have the same availability than a database that is optimized to store data on disk. You have to tune the checkpoints in order to balance between the performance and the availability.

TimesTen In-Memory database can be a good solution to accelerate some use-cases. Its compatibility with Oracle SQL and PL/SQL is very good. You can install it standalone or as a cache for Oracle Database. Licencing is 50% of Oracle Enterprise Edition.

TimesTen is also called 'Database In-Memory'. Which is a bit misleading because Oracle has also the 12c In-memory option which is very different. With competitors, TimesTen can be compared to SQL Server In-Memory OLTP. Because it's a unique event - a cross technology one (MSSQL, ORA, HANA) - I don't hesitate to link again to our free Event about In-Memory: boost your IT performance! In Switzerland (Lausanne, Basel and Zürich).

Talking about cross technology expertise, don't miss Daniel's variation on 1M rows insert into PostgreSQL and with synchronous commit .

EMC World Las Vegas – Momentum 2015 first day

Mon, 2015-05-04 18:34

Before starting talking about EMC event, I just would like to share my feeling about the city it takes place. It is Las Vegas, ok, what an amazing, impressive place! On top of that, we are in one of most beautiful hotels on the strip ! Congratulations to EMC and thanks for this attention.

 

For this first day of conferences, I decided to attend a first session about xCP platform. I use to manage content with D2, and wondered to know a few background of its brother – process oriented – xCP. Then I will explain a bit further news and forecasts about underlaying content management platform, Documentum itself.

 

xCP is one of leading application EMC wants to promote with D2 within enterprise content management system. In summary, xCP is described as a “rapid application development platform”. It meas it helps developers and business analyst to build application with lowering as much as possible real development works by providing an “extensive platform for building case management and business processes solution”.

 

In fact it aims to graphically build up applications by putting together several items, like forms and operational bricks, organized through processes for providing functionalities business teams are looking for. Such approach also aims to reduce development costs and maintainability over the time for more complex applications than only faceted records management.

 

Meanwhile, EMC released Documentum 7.2 platform with its several clients, D2 4.5, xCP 2.2 and Webtop 6.8.

 

In this release we can see several improvement areas like for security. With Documentum 7.2, we can now store and transmit contents using AES-256 bits encryption, considered by US NSA good enough as Top Secret protection criteria for the moment.

 

This version also provides enhanced capability through REST web services for searches, facet navigation, batches transactions, and indeed, further integration extensibility for third party software.

 

xPlore 1.5 also provides its own growing set of functionalities, like cross-repository subscription, privacy options, enhanced language words splittings and improved warm up. It is also good to know it keeps support with backwards compatibility to Documentum 6.7 SP2.

 

Then for next upgrades, EMC also provides higher-level migration tools like one for automatic encryption upgrade in Documentum.

 

This day was also very rich in terms of EMC corporate communication, for overall products policies and strategies for coming years and releases.

 

I hope you enjoyed reading this short summary of first day at EMC world – Momemtum 2015, and would like to thank you for your attention.

Variations on 1M rows insert(2): commit write - PostgreSQL

Mon, 2015-05-04 16:04
Franck was quite fast with his second post Variations on 1M rows insert(2): commit write. Lets see what we can do on the PostgreSQL side.

Variations on 1M rows insert (1): bulk insert - PostgreSQL

Mon, 2015-05-04 11:30

When I read Franck's post about Variations on 1M rows insert (1): bulk insert I thought doing quite the same in PostgreSQL might be interesting. Lets start by using the same test tables, one using a primary key and the other one without a primary key:

create table DEMO ("id" int , "text" varchar(15), "number" int);

create table DEMO_PK ("id" int , "text" varchar(15), "number" int,
                  constraint demo_pk_pk primary key (id) ) ;

postgres=# \d demo*
            Table "public.demo"
 Column |         Type          | Modifiers 
--------+-----------------------+-----------
 id     | integer               | 
 text   | character varying(15) | 
 number | integer               | 

           Table "public.demo_pk"
 Column |         Type          | Modifiers 
--------+-----------------------+-----------
 id     | integer               | not null
 text   | character varying(15) | 
 number | integer               | 
Indexes:
    "demo_pk_pk" PRIMARY KEY, btree (id)

   Index "public.demo_pk_pk"
 Column |  Type   | Definition 
--------+---------+------------
 id     | integer | id
primary key, btree, for table "public.demo_pk"

I am using a Virtual Box VM with Oracle Linux 7.1 (64bit) using 512mb of memory and one vCPU for the Tests:

cat /etc/oracle-release 
Oracle Linux Server release 7.1
cat /proc/meminfo | head -1
MemTotal:         502612 kB
cat /proc/cpuinfo | grep proc
processor	: 0



Similar to Franck's plsql block I'll use a plpgsql for the first tests:

\timing on
show autocommit;
truncate table DEMO;
DO $$DECLARE
  l_people_array varchar(15)[12] := '{"Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy","Greg", "Steve", "Kristina", "Katie", "Jeff"}';
  n int;
BEGIN
  for i in 0..1e6 loop
     n:=trunc(random()*1000+1);
     insert into DEMO values( i , l_people_array[floor((random()*11))+1::int] , n );
  end loop;
END$$;

(The code for the demo_pk table is exactly the same, except for the table name). It does not exactly the same as Frank's plsql does but it does comparable things, I believe :)

The PostgreSQL version I'll use is 9.4.1:

postgres=# select version();
                                                         version                                                          
--------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)


Lets start by executing the test without a primary key and thus no index:

postgres=# \i 1mio_rows_no_pk.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.183 ms
TRUNCATE TABLE
Time: 71.339 ms
DO
Time: 6861.547 ms

Not so bad. Doing the same with a primary key and thus an index to maintain:

postgres-# \i 1mio_rows_pk.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.494 ms
TRUNCATE TABLE
Time: 37.314 ms
DO
Time: 10900.631 ms

Around 3 seconds more, but again: Not so bad. As there is no "bulk collect" or "bulk insert" in PostgreSQL I can not do the same tests as Franck. But PostgreSQL knows a copy command, lets look at that. First I'll unload the table to a flat file so I have something to load:

postgres=# copy demo to '/home/postgres/demo.txt';
COPY 1000001
Time: 239.161 ms

Believe it or not: it took 240 ms to unload 1000001 rows from a table to a flat file. Now lets do the opposite and load the data back to the table using the copy command. The script used for doing this looks like this for both variations (with and without primary key):

\timing on
show autocommit;
truncate table demo;
copy DEMO from '/home/postgres/demo.txt';

First one without the primary key:

postgres=# \i 1mio_rows_no_pk_copy.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.663 ms
TRUNCATE TABLE
Time: 32.026 ms
COPY 1000001
Time: 1877.480 ms

The second one with the primary key:

postgres=# \i 1mio_rows_pk_copy.sql
Timing is on.
TRUNCATE TABLE
Time: 33.370 ms
COPY 1000001
Time: 6227.844 ms

This makes a big difference: 1,8 seconds compared to 6,2 seconds. But still amazingly fast. For those who do know PostgreSQL a little bit and might wonder about fsync:

postgres=# show fsync;
 fsync 
-------
 on
(1 row)

If I turn this off, without the primary key:

postgres=# alter system set fsync='off';
ALTER SYSTEM
Time: 62.066 ms
postgres=# select pg_reload_conf();
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "fsync" changed to "off"
 pg_reload_conf 
----------------
 t
(1 row)

Time: 7.253 ms
postgres=# show fsync;
 fsync 
-------
 off
(1 row)

Time: 0.601 ms
postgres=# \i 1mio_rows_no_pk_copy.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.277 ms
TRUNCATE TABLE
Time: 10.064 ms
COPY 1000001
Time: 611.455 ms
postgres=# \i 1mio_rows_pk_copy.sql
Timing is on.
TRUNCATE TABLE
Time: 11.768 ms
COPY 1000001
Time: 4674.273 ms

Around 0,6 seconds without the primary key and around 4 seconds with the primary key. But remember that you might loose data if fsync is off and the server crashed during the load.

Lets wait what Frank will do in the next post of his series.

Variations on 1M rows insert(2): commit write

Mon, 2015-05-04 10:39

In the previous blog post I measured the time it takes to insert 1 million rows. About 45 seconds when done row-by-row which goes down to 7 seconds when inserted in bulk. In both case, the commit was done only at the end. But in some cases you need to commit at each row, in order to make it visible to other sessions as soon as it is inserted (think of several sessions inserting concurrently). And commit makes it also durable - available even after an instance crashes - and that supposes that the change is written to disk. What's the cost for that?

Test case

Of course, I cannot do bulk insert for that as I want to commit for each row. I take the sane test as in the previous post. A table with a primary key (so only one index):

create table DEMO ("id" number , "text" varchar2(15), "number" number) ;

And the same PL/SQL except that I've added a COMMIT in the loop:

declare
 type people_array is varray(12) of varchar(15);
 people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
 people_count number :=people.COUNT;
 n number;
begin
 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
  commit;
 end loop;
 commit;
end;
/

My test environment is still the Oracle Developer Day 11g VM with 1 CPU allocated from by VirtualBox.

 

simple commit in PL/SQL

Yes, this is a special case. I'm running that in a PL/SQL statement and PL/SQL has a special optimization here because the 'commit successful' message is given to the user only at the end. Then the intermediate commits are lighter than commits that return to client. However, look at that 1 million insert + commit execution:

PL/SQL procedure successfully completed.
Elapsed: 00:02:16.34

We are far from the 45 seconds of row-by-row inserts that were committed only at the end. Commit has a big overhead.

 

commit write wait immediate

If you run the inserts from a client, doing a user call for each row, the special optimization is not there. In that case the commit waits for log writer and returns only when log writer has acknowledged the write to disk (and to SYNC standby database if you are in DataGuard). We can do the same from PL/SQL if we issue a COMMIT WRITE WAIT IMMEDIATE instead of a simple commit. So that's a good way to compare the time without introducing additional roundtrip latency.

 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
  commit write wait immediate;
 end loop;

When executing my procedure with the loop above, here is the time it takes:

PL/SQL procedure successfully completed.
Elapsed: 00:05:06.57

Time has doubled. This is the most expensive insert you can do. But it's also the only way if you want to:

  • Give commit feedback to the user at each commit
  • Ensure that the commit is done as quick as possible (this is the IMMEDIATE write) favoring response time over throughput.
  • Ensure that commit is durable (the D in ACID) meaning that we wait that the changes are on persistent storage (this is the WAIT write)

As we can see here, those requirements have a cost.

 

commit write wait batch

We can favor the throughput with the BATCH commit logging instead of IMMEDIATE.

 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
  commit write wait batch;
 end loop;

The optimization is not very high here where I've only one session doing all those inserts:

PL/SQL procedure successfully completed.
Elapsed: 00:04:28.97

but the redo size is reduced. I plan to show more statistics and wait events about it in a future post. The time spend here in the WAIT commits is mostly in that orange 'log file sync' waits we don't like to see on enterprise manager screen. The only point where session process is waiting for work that is expected to be done asynchronously in background. At commit, the session waits for the redo to be persisted.

 

commit write nowait immediate

If you are ok to loose a transaction even when the recent commit was successful, then you don't have to wait. That seems to be heretic for ACID aficionados, but is totally acceptable for supporters of 'eventual consistency'. I prefer to think about it according to business requirements.

When something - external to the database - occurs when the commit is successful, then you must trust in ACID and you have to wait. When your customer withdraw money at the ATM then once they have their money you must be sure that the operation is commited in the database. A crash of the database should not give back the money on his account because he will not give back the bills.

However, when you insert events coming from sensors (you can think Big Data) you probably want to keep them for a while in the database but you can accept to loose a few of them in the rare case of server crash. Especially if that improves a lot the performance and the scalability of the process. Maybe you can accept to loose some transactions. Or you are able to load them back again after a crash.
Think of it like the NOLOGGING operations. It's not what you want by default, but if you are sure about the business requirements, and the recover procedure, then you have new was to improve performance and scalability.

 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
  commit write nowait immediate;
 end loop;

And the improvement is obvious:

PL/SQL procedure successfully completed.
Elapsed: 00:02:08.75

When you don't wait, obviously, you improve the response time and the scalability.

 

commit write nowait batch

And we can batch the redo record, as we did above, in addition to nowait:

 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
  commit write nowait batch;
 end loop;

And the improvement is obvious:

PL/SQL procedure successfully completed.
Elapsed: 00:01:45.33

Let the log writer do its job in batch, without waiting for its acknowledgement, and the row-by-row commit overhead is minimized. And it's ok as long as you know what you are doing:

  • You can manage the loss of your transaction in case of server crash. Either by re-executing them or because the missing changes have no consequence.
  • You do a WAIT commit when you return to the user a commit status, because he can do something you don't manage and which depend on that commit status.
statement level, session level, instance level

I show only the statement level way to manage commit wait and commit logging because it's a decision that must be done by developers. You can choose the same behavior with the commit_wait and commit_logging session parameters but that's probably not the right option. As I said before, the commit behavior probably depends on where it is in the code. The final commit before returning to the user is probably managed differently.
For the same reason, you probably don't change those parameters at PDB or instance level. But there are always exceptions. If you are doing an application migration that is using the row-by-row API to insert data into the new system, then maybe you can do all that in NOWAIT BATCH by setting instance parameters. If it fails, you restart from the beginning (flashback database helps here).

This kind of optimization exists also in SQL Server. David Bararin has blogged about MSSQL delayed durability previously. Want to meet in one day our experts from all technologies? Come to our Event In-Memory: boost your IT performance! where we talk about SQL Server, Oracle and SAP HANA.

And as we are talking about cross technology, stay tuned to see 1M row insert variations on other database systems...

EMC World 2015 - Day 1 at Momentum

Mon, 2015-05-04 08:07

The first day of my first EMC World conferences and specially the ones from Momentum wich covers the Enterprise Content Division (ECD) products/solutions/strategies aso. The start was great, being in Las Vegas where you have the feeling you are on another planet, I had the same feel during the General Session or the ECD Keynote; each time good explanations coupled with good shows.

The information I have got was interresting and some questions came in my mind. Questions that I hope can be answered in the next days.

InfoArchive

Before attending the General Session I went to another one which was about EMC InfoArchive. Today I work mainly with the Documentum Content Server and products around it like xPlore, ADTS, D2 aso.

To be prepared for new futur customer requests and challenges I wanted to see what is behind InfoArchive. Let's give some points:

- One main goal of using InfoArchive is to reduce the cost of the storage and to keep the assets.

- Once legacy applications are shut down, you can archive their data into InfoArchive. You can also use it to archive data from active applications where you can build some rules to define which data will be moved to InfoArchive. And this can be done for flat, complex as well as, of course, for document records.

- When the data are saved into InfoArchive, you can use xQuery, xForm to retrieve the data and display them in a way the user wants to see it.

That's on the general overview. On a technical point of view here some information:

- The Archive Service is build using a Data Service (xDB data server) and/or a Content Server. In case you have to archive only metadata the xDB service is sufficient.

- The storage to be used is obviously the EMC storages but other ones can also be used meaning this solution can be implemented in more type of infrastructures.

- To the question what is archived, the answer is SIP (Submission Information Package). You have a SIP descriptor  and SIP Data (metadata or/and Content)

- LWSO objects are stored to use less storage

- The search is done first against the AIP (Archive Info Packages) and once the object is found, against the AIU (Archive Info Unit).There is no fulltext available on the InfoArchive layer, the reason is that an archive system does not use it in general.

- RPS can be used to manage the retention.

Open questions

So that for the "facts", now there are some other open points which could be raised in case InforArchive will be used. You can save you data in normal XML formats but you can also define how the data are saved and how you want to search them. In this case who will manage that, the Record&Archive team or do you need first a business analyste? Can the defined model easily be changed for the current archived information? There are technical questions but I think the organization has first to be defined to have a successfull implementation of InfoArchive

Again, some questions are coming in my mind. And again, let's see if I can have some answers in ... the next days.

Get the main Administration Information from SQL Server with PowerShell

Mon, 2015-05-04 07:48

In my previous blog Automate SQL Server Administration with PowerShell – How to retrieve SQL Server services?, I presented you the first step of the SQL Server administration through an automated process.

This blog is a follow-up of the previous one, and it will focus on retrieving information of a SQL Server instance with PowerShell

 

Disclaimer: I am not a developer but a SQL Server dba. If you find errors or some ways of improvement, I will be glad to read your comments!

 

List all SQL Server instances

To be able to proceed for all the instances, you can easily get all your instance names with this function:

Get-SQLInstances_function_20150504-135411_1.png

 

Retrieve SQL Server instance information

In my example, I execute my script on the machine hosting my SQL Server instance.

I use SMO objects to access to the instance information . But you need the instance full name , as follows:

full_instance_name.png

I only give the instance name as parameter because I execute my script on a local server, otherwise I need to give the server name as parameter.

 

First I initialize my SMO object of my instance like this:

instance_smo_object.png

 

This SMO object contains SQL Server instance main information. To list all properties and the object methods, proceed as follows:

service_properties_and_methods_20150504-141841_1.png

 

To list the general information of  the instance, you can proceed like this:

instance_general_information.png

To list the directory paths related to the instance, here is an example:

instance_directories.png

To list important instance configuration, here is an example:

instance_configuration.png

 

By formating the information you retrieve in the instance SMO object, you can generate reports, audit your environment or whatever!

The following capture is an existing dashboard from our Database Management Kit (DMK).

dmk_instance_information.png

 

Next steps

The SMO object for the SQL Server instance has a limit number of properties and methods. Sometimes, you need information which are not present in the object. In this case, you must use the "sqlcmd" command and retrieve your information by using T-SQL.

Here is the way to proceed:

invoke_sqlcmd_command.png

To retrieve any of SQL Server instance information, the "sqlcmd" command would always work. You can also use it to modify the instance configuration.

 

I hope this blog will help you in your work. In my next blog, I will show you how to access to your database information with PowerShell.

Get SQL Server services with PowerShell

Mon, 2015-05-04 02:13
 

SQL Server Configuration Manager and SQL Server Management Studio are the main tools to administrate the components of SQL Server. They are very convenient to use and pretty complete.
But as soon as you wish an automated process, these tools have their limitations. Nevertheless, there is still the solution: PowerShell!

This blog introduces a first step towards an automation process of SQL Server administration. I will retrieve all SQL Server services related to a specific instance name.

The process will always be similar by using the SMO WMI server PowerShell object.

 

Disclaimer: I am not a developer but a SQL Server dba. If you find errors or some ways of improvement, I will be glad to read your comments!

 

SQL Engine

To retrieve the SQL Engine service for a specific instance name:

Get-SQLEngine_function.png

  SQL Agent

To retrieve the SQL Agent service for a specific instance name:

Get-SQLAgent_function.png

  SQL Full-text Filter

To retrieve the SQL Full-text Filter service for a specific instance name:

Get-SQLFullTextFilter_function.png

  SQL Browser

To retrieve the SQL Browser service:

Get-SQLBrowser_function.png

  SQL Analysis

To retrieve the SQL Analysis service for a specific instance name:

Get-SQLAnalysis_functionpng.png

  SQL Reporting

To retrieve the SQL Reporting service for a specific instance name:

Get-SQLReporting_function.png

  SQL Integration

To retrieve the SQL Integration service:

Get-SQLIntegration_function.png

  Service Object

Each function returns an object with the following properties and methods:

service_properties_and_methods.png

You are able to start, restart or stop your service. But you can also retrieve specific information such as the Service Account, the Start Mode or the Service Account.

  Next Step

If you do not want to proceed just for a specific instance, but for all instances, you can list all instance names in that way:

Get-SQLInstances_function.png

Then, with your list of instance names, you loop by calling each function. Do not forget to test if the service returned exists (by testing if it is null).

 

To finish my article, all these functions are part of our Database Management Kit (DMK) developed by our team. We use it to access faster to common and standards information, but also to automate processes.

For example, the DMK is able (in just one command!) to make a security audit of your SQL Server environment, by following the best practices from Microsoft and from our experts. A report is generated at the end of the audit to list all the security points to review.

getting started with postgres plus advanced server (3) - setting up a hot standby server

Sat, 2015-05-02 02:42

So, we have a ppas 94 database up and running and we have a backup server for backing up and restoring the database. Now it is time to additionally protect the database by setting up a hot standby database. This database could even be used to offload reporting functionality from the primary database as the standby database will be open in read only mode. Again, I'll use another system for that so that the system overview looks like this:

server ip address purpose ppas 192.168.56.243 ppas database cluster ppasbart 192.168.56.245 backup and recovery server ppasstandby 192.168.56.244 ppas hot standby database


As the standby database will need the ppas binaries just follow the first post for setting this up again. Once the binaries are installed and the database is up and running I'll completely destroy it but keep the data directory:

[root@oel7 tmp]# service ppas-9.4 stop
Stopping Postgres Plus Advanced Server 9.4: 
waiting for server to shut down.... done
server stopped
[root@oel7 tmp]# rm -rf /opt/PostgresPlus/9.4AS/data/*
[root@oel7 tmp]# 

Ready to go. It is amazingly easy to setup a hot standby server with postgres. In a nutshell, everything that needs to be done is to create a replication user in the database, do a base backup of the primary database, copy that to the standby server, create a recovery.conf file and startup the standby database. Lets start by creating the user which will be used for the recovery in the primary database:

[root@ppas ~]# su - enterprisedb
-bash-4.2$ . ./pgplus_env.sh 
-bash-4.2$ psql
psql.bin (9.4.1.3)
Type "help" for help.

edb=# edb=# create role standby LOGIN REPLICATION UNENCRYPTED PASSWORD 'standby';
CREATE ROLE
edb=# commit;
COMMIT
edb'# 

... and adjust the pg_hba.conf file (the second entry is for the base backup later):

-bash-4.2$ tail -1 data/pg_hba.conf
host    replication     standby         192.168.56.244/24          md5
local   replication     standby                                              md5

... and adjust the wal-level in postgresql.conf

-bash-4.2$ grep wal_level data/postgresql.conf 
wal_level = hot_standby			# minimal, archive, hot_standby, or logical

For the settings in pg_hba.conf and postgresql.conf to take effect either a reload of the main server process or a complete restart is required:

-bash-4.2$ pg_ctl -D data/ restart
waiting for server to shut down..... done
server stopped
server starting

Now it is a good time to test if we can connect to the primary database from the standby node:

[root@oel7 tmp]# /opt/PostgresPlus/9.4AS/bin/psql -h 192.168.56.243 -U standby edb
Password for user standby: 
psql.bin (9.4.1.3)
Type "help" for help.

edb=> 

Ready for the basebackup of the primary database?

mkdir /var/tmp/primary_base_backup/
-bash-4.2$ pg_basebackup -D /var/tmp/primary_base_backup/ -U standby -F t -R -x -z -l for_standby -P
Password: 
56517/56517 kB (100%), 1/1 tablespace
-bash-4.2$ 

Especially notice the "-R" switch of pg_basebackup as this creates a minimal recovery.conf for us which we can use as a template for our standby database. Transfer and extract the file written to the standby server (I again prepared passwordless ssh authentication between the primary and the standby server. check the second post on how to do that).

bash-4.2$ pwd
/opt/PostgresPlus/9.4AS/data
bash-4.2$ scp 192.168.56.243:/var/tmp/primary_base_backup/* .
base.tar.gz                                                                                                  100% 5864KB   5.7MB/s   00:00    
-bash-4.2$ 
-bash-4.2$ tar -axf base.tar.gz 
-bash-4.2$ ls
backup_label  dbms_pipe  pg_dynshmem    pg_log        pg_notify    pg_snapshots  pg_subtrans  PG_VERSION            postgresql.conf
base          global     pg_hba.conf    pg_logical    pg_replslot  pg_stat       pg_tblspc    pg_xlog               recovery.conf
base.tar.gz   pg_clog    pg_ident.conf  pg_multixact  pg_serial    pg_stat_tmp   pg_twophase  postgresql.auto.conf
-bash-4.2$ 

Almost ready. Now we need to adjust the recovery.conf file:

standby_mode = 'on'
primary_conninfo = 'host=192.168.56.243 port=5444 user=standby password=standby'
restore_command = 'scp bart@192.168.56.245:/opt/backup/ppas94/archived_wals/%f %p'

... and enable hot standby mode in the postgresql.conf file on the standby server and adjust the listen address:

-bash-4.2$ grep hot postgresql.conf 
wal_level = hot_standby			# minimal, archive, hot_standby, or logical
hot_standby = on			# "on" allows queries during recovery
#hot_standby_feedback = off		# send info from standby to prevent
-bash-4.2$ grep listen data/postgresql.conf
listen_addresses = '192.168.56.244'		# what IP address(es) to listen on;

Startup the standby database and if everything is fine messages similar to this should be reported in the postgresql log file (/opt/PostgresPlus/9.4AS/data/pg_log/):

2015-04-29 14:03:36 CEST LOG:  entering standby mode
scp: /opt/backup/ppas94/archived_wals/000000010000000000000017: No such file or directory
2015-04-29 14:03:36 CEST LOG:  consistent recovery state reached at 0/17000090
2015-04-29 14:03:36 CEST LOG:  redo starts at 0/17000090
2015-04-29 14:03:36 CEST LOG:  record with zero length at 0/170000C8
2015-04-29 14:03:36 CEST LOG:  database system is ready to accept read only connections
2015-04-29 14:03:36 CEST LOG:  started streaming WAL from primary at 0/17000000 on timeline 1

To further prove the setup lets create a simple table in the primary database and add some rows to it:

edb=# create table standby_test ( a int ); 
CREATE TABLE
edb=# insert into standby_test values (1);
INSERT 0 1
edb=# insert into standby_test values (2);
INSERT 0 1
edb=# commit;
COMMIT
edb=# \! hostname
ppas.local
edb=# 

Lets see if we can query the table on the standby:

-bash-4.2$ psql
psql.bin (9.4.1.3)
Type "help" for help.

edb=# select * from standby_test;
 a 
---
 1
 2
(2 rows)

edb=# \! hostname
ppasstandby.local
edb=# 

Cool. Minimal effort for getting a hot standby database up and running. Make yourself familiar with the various settings that influence the behavior of the standby database. I'll write another post on how to do failovers in near future.

Variations on 1M rows insert (1): bulk insert

Fri, 2015-05-01 13:30

This blog is the first one of a series about inserting 1 million rows. Do you have an idea about the time it takes? The idea came from another project, about NoSQL databases, but I'll focus on Oracle in those posts. There are a lot of different ways to insert 1 million rows in Oracle: row-by-row insert or bulk, row-by-row commit or not, different ways to commit, In-Memory Database, Database In-Memory Option, etc.

Test case

Here is the table I create:

create table DEMO ("id" number , "text" varchar2(15), "number" number) ;

Here is the first PL/SQL I'll run:

declare
 type people_array is varray(12) of varchar(15);
 people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
 people_count number :=people.COUNT;
 n number;
begin
 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
 end loop;
 commit;
end;
/

You may wonder why I chose this testcase. As I said, this comes from another project and you can google to find where the the test case comes from. The performance will be mainly about inserts but the idea is also to implement some procedural processing to show that it's possible in different database system, here in PL/SQL on Oracle and in a future post in TimesTen.

 

Test environment

All the test will be done on a VM with only one CPU activated. This is important because some activity will involve background processes.

$ grep "model name" /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz

 

Oracle 11g

My first tests are done on the first Oracle Developer Day VM that was available for download. It's a 11g version. We will see 12c later.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production

And it's 32-bit:

SQL> select addr,4*length(addr) from v$process where rownum=1;

ADDR     4*LENGTH(ADDR)
-------- --------------
3B3C9F80             32

 

With and without index

As you have seen above, I've created the table with a primary key, so I have an index on it. How long does it take to run the statement above to insert 1 million rows?

PL/SQL procedure successfully completed.
Elapsed: 00:00:46.80

On one CPU it takes 46 seconds to insert one million rows, row-by-row, from a simple PL/SQL loop. We will take that as the base for future comparisons. Of course I've done several runs and I keep them as long as there is less than 5% difference.

 

Just to show that the index has an important cost when inserting, let's run the same testcase without the PRIMARY KEY:

PL/SQL procedure successfully completed.
Elapsed: 00:00:35.10

I'll keep the testcase with the primary key index as a point of comparison. There is no point to insert one million rows if we can't retreive one easily. And the table with primary key pattern is the most relevant one when comparing different databases - SQL or NoSQL.

 

Bulk insert

The previous inserts are quite optimal: directly done from PL/SQL in the database, so no roundtrips. Commit only at the end and we know that in Oracle it's better not to commit too often. I will cover the need of row-by-row commits in a future post. But this is still row-by-row insert and this is not optimal. When we have all rows to insert from one session, then we can insert in bulk (one execution of the insert statement inserts several rows).

There are several ways to do that depending on the client language. Here we are in PL/SQL, so it is easy: put all rows into a collection and call the insert with that collection:

declare
 type people_array is varray(12) of varchar(15);
 people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
 people_count number :=people.COUNT;
 -- declare the collection
 type DEMO_record_type is record ("user_id" number , "name" varchar2(15), "number" number) ;
 type DEMO_collection_type is table of DEMO_record_type index by binary_integer;
 DEMO_collection DEMO_collection_type;
begin
 -- fill the collection
 for i in 1..1e6 loop
  DEMO_collection(i)."user_id":=i;
  DEMO_collection(i)."name":=people( dbms_random.value(1,people.count) );
  DEMO_collection(i)."number":=trunc(dbms_random.value(0,10000));
  null;
 end loop;
 -- call the insert
 forall i in 1..DEMO_collection.count insert into DEMO values DEMO_collection(i);
 commit;
end;
/

The result is obvious:

PL/SQL procedure successfully completed.
Elapsed: 00:00:06.55

And for the testcase without the primary key index:

PL/SQL procedure successfully completed.
Elapsed: 00:00:03.80

So bulk inserts here are 8x to 10x faster.

 

conclusion

The immediate conclusion is that when you have lot of rows to insert and you don't need to commit at each row, then you should do it in bulk. Here I made it from PL/SQL which run directly in the database. If you are inserting from an application server or a remote client, then the elimination of roundtrips will be a lot more important.

 

In future posts we will see what happens when you need to commit for each row. And I will also run it with In-Memory database and with Database In-Memory. Don't forget that if you are in Switzerland in June, our experts from Oracle, Microsoft, and SAP technologies will talk about In-Memory as implemented by SQL Server, Oracle and SAP HANA. All the same day. It's free and you can register now: Event In-Memory: boost your IT performance!

Using Windows 2012 R2 & dynamic witness feature with minimal configurations - part II

Thu, 2015-04-30 11:53

I wrote a blog post some time ago about using a file share witness with a minimal windows failover cluster configuration that consists of two cluster nodes. In this blog post, I told I was reluctant to use a witness in this case because it introduces a weakness in the availability process. Indeed, the system is not able to adjust node weight in this configuration but it does mean that we don’t need a witness in this case and this is what I want to clarify here. I admit myself I was wrong on this subject during for some time.

Let’s set the scene with a pretty simple Windows failover cluster architecture that includes two nodes and with dynamic quorum but without a configured witness. The node vote configuration is as follows:

 

blog_38_-_1_-_cluster_nodes_state

 

At this point the system will affect randomly a node weight to the current available nodes. For instance, in my context the vote is affected to the SQL143 node but there is a weakness in this configuration. Let’s first say the node SQL141 goes down in an unplanned scenario. In this case the cluster stays functioning because the node SQL143 has the vote (last man standing). Now, let’s say this time the node SQL143 goes down in an unplanned scenario. In this case the cluster will lost the quorum because the node SQL141 doesn’t have the vote to survive. You will find related entries in the cluster event log as shown to the next picture with two specific event ids (1135 and 1177).

 

blog_38_-_2_-_event_viewer

 

However in the event of the node SQL143 is gracefully shutdown, the cluster will able to remove the vote of the node SQL143 and give it to the node SQL141. But you know, I’m a follower of the murphy law: anything that can go wrong, will go wrong and it is particularly true in IT world.

So we don’t have the choice here. To protect from unplanned failure with two nodes, we should add a witness and at this point you may use either a disk or a file share witness. My preference is to promote first the disk quorum type but it is often not suitable with customers especially for geo cluster configuration. In this case using file share witness is very useful but it might introduce some important considerations about quorum resiliency. First of all, I want to exclude scenarios where the cluster resides on one datacenter. There are no really considerations here because the loose of the datacenter implies the unavailability of the entire cluster (and surely other components).  

Let’s talk about geo location clusters often used with SQL Server availability groups and where important considerations must be made about the file share witness localization. Indeed, most of my customers are dealing only with two datacenters and in this case the 100$ question is where to place it? Most of time, we will place the witness in the location of what we can call the primary datacenter. If the connectivity is lost between the two datacenters the service stays functioning in the primary datacenter. However a manual activation will be required in the event of full primary data center failure.

 

blog_38_-_3_-_geo_clust_primary_without_change

blog_38_-_3_-_geo_clust_primary_

 

 

 

Another scenario consists in placing the witness on the secondary datacenter. Unlike our first scenario, a network failure between the two datacenters will trigger an automatic failover of the resources to the secondary datacenter but if in the event of a complete failure of the secondary datacenter, the cluster will lost the quorum (as a reminder the remaining node is not able to survive).

 

blog_38_-_4_-_geo_clust_secondary_

        blog_38_-_4_-_geo_clust_secondary_failover

 

 

As you can see, each of aforementioned scenario have their advantages and drawbacks. A better situation would be to have a third datacenter to host the witness. Indeed, in the event of network failure between the two datacenters that host the cluster nodes, the vote will be assigned to the node which will first successfully lock the file share witness this time.

Keep in mind that even in this third case, losing the witness because either of a network failure between the two main datacenters and the third datacenter or the file share used by the witness deleted accidently by an administrator, can compromise the entire of the cluster availability in case of a node failure (one who has the vote). So be aware to monitor correctly this critical resource.

So, I would finish by a personal think. I always wondered why in the case of a minimal configuration (only 2 cluster nodes and a FSW), the cluster was not able to perform weight adjustment. Until now, I didn’t get the response from Microsoft but after some time, I think this weird behavior is quite normal. Let’s image the scenario where your file share witness resource is in failed state and the cluster is able to perform weight adjustment. Which of the nodes it may choose? The primary or the secondary? In fact it doesn’t matter because in the both cases, the next failure of the node which has the vote will also shutdown the cluster. Finally it is just delaying an inevitable situation …

Happy clustering !

Windows Server Next, AKA Windows Server 2016!

Thu, 2015-04-30 02:10

The first technical preview of the future version of Windows Server was, since last October, available (here) and a second one with more new features should be available in May.
The final version which should be normally released in 2015 has been recently postponed to 2016. It will be the first time that the client and server releases will be decoupled.

This new version of Windows Server will include:

  • new and changed functionalities for Hyper-V
  • improvements for Remote Desktop Services
  • new and updated functionalities for Failover Clustering
  • significant new features with PowerShell 5.0
  • directory services, Web application proxy and other features
  • ...

According to Microsoft employee, Jeffrey Snover, the next version of Windows Server has been deeply refactoring to really build a Cloud-optimized server! A server which is deeply refactored for a cloud scenario.

b2ap3_thumbnail_WinowsServer2016_Roadmap.jpg

The goal is to scope out my needs in order to use only the required components.

On top of this Cloud-Optimized server, the server will be build, the same server that we have for the moment, compatible with that we have but with two application profiles:

  • the first application profile which will target the existing set of APIs server
  • the second will be a subset of APIs which will be cloud-optimized

Microsoft works also to further clarify the difference between Server and Client to avoid making a mix between client APIs and server APIs for example.

Microsoft will also introduce Docker containers to his new Windows Server 2016! Container is a compute environment also called compute container.

We will have two flavors of compute containers:

  • one for application compatibility (server running in a container)
  • a second optimized for the cloud (cloud-optimized server)


Docker container

 The goal of Docker is to embed an application into a virtual container. Application via the container will be able to be executed without any problem on Windows or Linux servers. This technology will facilitate the deployment of application and is offered as Open Source under apache license by an American company called Docker.

b2ap3_thumbnail_WinowsServer2016_Docker1.jpgb2ap3_thumbnail_WinowsServer2016_Docker2.jpg

A container is very lightweight as it does not contain its own operation system. In fact, it will use the host machine in order to achieve all of the system calls.

Migration of Docker containers will be easier as their weight are small.

The bigger clouds providers like Amazon on AWS, Microsoft on Azure, Google on Google Compute, have already integrated this new technology... Dealing with Docker containers give the possibility to migrate from one cloud to another one easily.


Nano server

In addition, the Docker container technology which will come with Windows Server 2016 will be part of a set of application deployment services, called Nano Server.

According to an internal presentation of WZor published by Microsoft, Nano Server is presented as “The future of Windows Server”.

Nano Server will be a zero-footprint model, server roles and optional features will reside outside of it. No binaries or metadata in the image, it will be just standalone packages.

Hyper-V, Clustering, Storage, Core CLR, ASP.NET V.Next, PaaS v2, containers will be part of the new roles and features.

The goal will be also to change the mentality of servers management. Tend towards remote management and process automation via Core PowerShell and WMI. In order to facilitate remote management, local tools like Task manager, Registry editor, Event viewer... will be replaced by web-based tools and accessible via a remote connection.

This new solution will be integrated also in Visual Studio.

 

In conclusion, WZor summarized Nano Server as “a nucleus of next-gen cloud infrastructure and applications”. This shows the direction that Microsoft wants to give to Windows Server 2016: even better integration to the cloud, optimization for new distributed applications and management facilitation.

getting started with postgres plus advanced server (2) – setting up a backup and recovery server

Wed, 2015-04-29 23:50

The first post in this series explained how to get ppas installed on a linux system. Now that the database cluster is up and running we should take care immediately about backup and recovery. For this I'll use another system where I'll install and configure bart. So, the system overview for now is:

server ip address purpose ppas 192.168.56.243 ppas database cluster ppasbart 192.168.56.245 backup and recovery server


As bart requires the postgres binaries I'll just repeat the ppas installation on the bart server. Check the first post on how to do that.

tip: there is a "--extract-only" switch which only extracts the binaries without bringing up a database cluster.

After that just install the bart rpm:

yum localinstall edb-bart-1.0.2-1.rhel6.x86_64.rpm

All the files will be installed under:

ls -la /usr/edb-bart-1.0/
total 20
drwxr-xr-x.  4 root root    44 Apr 23 13:41 .
drwxr-xr-x. 14 root root  4096 Apr 23 13:41 ..
drwxr-xr-x.  2 root root    17 Apr 23 13:41 bin
drwxr-xr-x.  2 root root    21 Apr 23 13:41 etc
-rw-r--r--.  1 root root 15225 Jan 27 15:24 license.txt

Having a dedicated user for bart is a good idea:

# groupadd bart
# useradd -g bart bart
# passwd bart
Changing password for user bart.
New password: 
Retype new password: 
$passwd: all authentication tokens updated successfully.

As backups need some space a top level directory for all the bart backups needs to be created:

# mkdir /opt/backup
chown bart:bart /opt/backup
chmod 700 /opt/backup
mkdir -p /opt/backup/ppas94/archived_wals

Now everything is in place to start the bart configuration. A minimal configuration file would look like this:

cat /usr/edb-bart-1.0/etc/bart.cfg
[BART]
bart-host = bart@192.168.56.245
backup_path = /opt/backup
pg_basebackup_path = /opt/PostgresPlus/9.4AS/bin/pg_basebackup
logfile = /var/tmp/bart.log
xlog-method = fetch

[PPAS94]
host = 192.168.56.243
port = 5444
user = enterprisedb
description = "PPAS 94 server"

The BART section is the global section while the next sections are specific to the database clusters to backup and restore. As bart requires passwordless ssh authentication between the bart host and the database host to be backup up lets setup this. On the bart bart host ( ppasbart ):

su - bart
ssh-keygen -t rsa

On the host where database runs ( ppas ):

su -
cd /opt/PostgresPlus/9.4AS
mkdir .ssh
chown enterprisedb:enterprisedb .ssh/
chmod 700 .ssh/
su - enterprisedb
ssh-keygen -t rsa

As the public keys are now available we'll need to make them available on each host. On the ppas host:

cat .ssh/id_rsa.pub > .ssh/authorized_keys
chmod 600 .ssh/authorized_keys

Add the public key from the barthost to the authorized keys file above. Example: get the public key from the bart host:

[bart@ppasbart ~]$ id
uid=1001(bart) gid=1001(bart) groups=1001(bart) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[bart@ppasbart ~]$ cat .ssh/id_rsa.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCn+DN//ia+BocR6kTfHkPoXfx3/HRU5KM1Bqy1nDeGnUn98CSl3kbRkUkiyumDfj4XOIoxOxnVJw6Invyi2VjzeQ12XMMILBFRBAoePDpy4kOQWY+SaS215G72DKzNYY8nGPUwjaQdFpFt3eQhwLP4D5uqomPIi9Dmv7Gp8ZHU0DBgJfrDaqrg8oF3GrzF50ZRjZTAkF3pDxJnrzIEEme+QQFKVxBnSU2ClS5XHdjMBWg+oSx3XSEBHZefP9NgX22ru52lTWmvTscUQbIbDo8SaWucIZC7uhvljteN4AuAdMv+OUblOm9ZUtO2Y8vX8hNMJvqRBlYh9RGl+m6wUZLN document.write(['bart','ppasbart.loca'].join('@'))l

Copy/paste this key into the authorized_keys file for the enterprisedb user on the database host, so that the file looks similar to this:

cat .ssh/id_rsa.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCn+DN//ia+BocR6kTfHkPoXfx3/HRU5KM1Bqy1nDeGnUn98CSl3kbRkUkiyumDfj4XOIoxOxnVJw6Invyi2VjzeQ12XMMILBFRBAoePDpy4kOQWY+SaS215G72DKzNYY8nGPUwjaQdFpFt3eQhwLP4D5uqomPIi9Dmv7Gp8ZHU0DBgJfrDaqrg8oF3GrzF50ZRjZTAkF3pDxJnrzIEEme+QQFKVxBnSU2ClS5XHdjMBWg+oSx3XSEBHZefP9NgX22ru52lTWmvTscUQbIbDo8SaWucIZC7uhvljteN4AuAdMv+OUblOm9ZUtO2Y8vX8hNMJvqRBlYh9RGl+m6wUZLN l
[bart@ppasbart ~]$ cat .ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAQZWeegLpqVB20c3cIN0Bc7pN6OjFM5pBsunDbO6SQ0+UYxZGScwjnX9FSOlmYzqrlz62jxV2dOJBHgaJj/mbFs5XbmvFw6Z4Zj224aBOXAfej4nHqVnn1Tpuum4HIrbsau3rI+jLCNP+MKnumwM7JiG06dsoG4PeUOghCLyFrItq2/uCIDHWoeQCqqnLD/lLG5y1YXQCSR4VkiQm62tU0aTUBQdZWnvtgskKkHWyVRERfLOmlz2puvmmc5YxmQ5XBVMN5dIcIZntTfx3JC3imjrUl10L3hkiPkV0eAt3KtC1M0n9DDao3SfHFfKfEfp5p69vvpZM2uGFbcpkQrtN l
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCn+DN//ia+BocR6kTfHkPoXfx3/HRU5KM1Bqy1nDeGnUn98CSl3kbRkUkiyumDfj4XOIoxOxnVJw6Invyi2VjzeQ12XMMILBFRBAoePDpy4kOQWY+SaS215G72DKzNYY8nGPUwjaQdFpFt3eQhwLP4D5uqomPIi9Dmv7Gp8ZHU0DBgJfrDaqrg8oF3GrzF50ZRjZTAkF3pDxJnrzIEEme+QQFKVxBnSU2ClS5XHdjMBWg+oSx3XSEBHZefP9NgX22ru52lTWmvTscUQbIbDo8SaWucIZC7uhvljteN4AuAdMv+OUblOm9ZUtO2Y8vX8hNMJvqRBlYh9RGl+m6wUZLN

Make the file the same on the bart host and test if you can connect without passwords:

[bart@ppasbart ~]$ hostname
ppasbart.local
[bart@ppasbart ~]$ ssh bart@ppasbart
Last login: Thu Apr 23 14:24:39 2015 from ppas
[bart@ppasbart ~]$ logout
Connection to ppasbart closed.
[bart@ppasbart ~]$ ssh enterprisedb@ppas
Last login: Thu Apr 23 14:24:47 2015 from ppas
-bash-4.2$ logout
Connection to ppas closed.

Do the same test on the ppas host:

bash-4.2$ hostname
ppas.local
-bash-4.2$ ssh bart@ppasbart
Last login: Thu Apr 23 14:22:07 2015 from ppasbart
[bart@ppasbart ~]$ logout
Connection to ppasbart closed.
-bash-4.2$ ssh enterprisedb@ppas
Last login: Thu Apr 23 14:22:18 2015 from ppasbart
-bash-4.2$ logout
Connection to ppas closed.
-bash-4.2$ 

Once this works we need to setup a replication user in the database being backed up. So create the user in the database which runs on the ppas host (I'll do that with enterprise user instead of the postgres user as we'll need to adjust pg_hba.conf file right after creating the user):

[root@ppas 9.4AS]# su - enterprisedb
Last login: Thu Apr 23 14:25:50 CEST 2015 from ppasbart on pts/1
-bash-4.2$ . pgplus_env.sh
-bash-4.2$ psql -U enterprisedb
psql.bin (9.4.1.3)
Type "help" for help.

edb=# CREATE ROLE bart WITH LOGIN REPLICATION PASSWORD 'bart';       
CREATE ROLE
edb=# exit
-bash-4.2$ echo "host    all     bart         192.168.56.245/32          md5" >> data/pg_hba.conf

Make sure that the IP matches your bart host. Then adjust the bart.cfg file on the bart host to match your configuration:

cat /usr/edb-bart-1.0/etc/bart.cfg
[BART]
bart-host = bart@192.168.56.245
backup_path = /opt/backup
pg_basebackup_path = /opt/PostgresPlus/9.4AS/bin/pg_basebackup
logfile = /var/tmp/bart.log
xlog-method = fetch

[PPAS94]
host = 192.168.56.243
port = 5444
user = bart
remote-host = enterprisedb@192.168.56.243
description = "PPAS 94 remote server"

Another requirement is that the bart database user must be able to connect to the database without prompting for a password. Thus we create the .pgpass file on the bart host which is used for reading the password:

[bart@ppasbart ~]$ cat .pgpass 
192.168.56.243:5444:*:bart:bart
[bart@ppasbart ~]$ chmod 600 .pgpass

As a last step we need to enable wal archiving on the database that should be backed up. The following parameters need to be set in the postgresql.conf file:

wal_level = archive  # or higher
archive_mode = on
archive_command = 'scp %p bart@192.168.56.245:/opt/backup/ppas94/archived_wals/%f'
max_wal_senders = 1  # or higher

Once done restart the database cluster:

su -
service ppas-9.4 restart

Lets see if bart can see anything on the bart server:

[bart@ppasbart ~]$ /usr/edb-bart-1.0/bin/bart -c /usr/edb-bart-1.0/etc/bart.cfg SHOW-SERVERS -s PPAS94 
Server name         : ppas94
Host name           : 192.168.56.243
User name           : bart
Port                : 5444
Remote host         : enterprisedb@192.168.56.243
Archive path        : /opt/backup/ppas94/archived_wals
WARNING: xlog-method is empty, defaulting to global policy
Xlog Method         : fetch
Tablespace path(s)  : 
Description         : "PPAS 94 remote server"

Looks fine. So lets do a backup:

[bart@ppasbart ~]$ /usr/edb-bart-1.0/bin/bart -c /usr/edb-bart-1.0/etc/bart.cfg BACKUP -s PPAS94

INFO:  creating backup for server 'ppas94'
INFO:  backup identifier: '1429795268774'
WARNING: xlog-method is empty, defaulting to global policy
56357/56357 kB (100%), 1/1 tablespace

INFO:  backup checksum: 6e614f981902c99326a7625a9c262d98
INFO:  backup completed successfully

Cool. Lets see what is in the backup catalog:

[root@ppasbart tmp]# ls -la /opt/backup/
total 0
drwx------. 3 bart bart 19 Apr 23 15:02 .
drwxr-xr-x. 4 root root 38 Apr 23 13:49 ..
drwx------. 4 bart bart 46 Apr 23 15:21 ppas94
[root@ppasbart tmp]# ls -la /opt/backup/ppas94/
total 4
drwx------. 4 bart bart   46 Apr 23 15:21 .
drwx------. 3 bart bart   19 Apr 23 15:02 ..
drwx------. 2 bart bart   36 Apr 23 15:21 1429795268774
drwx------. 2 bart bart 4096 Apr 23 15:21 archived_wals
[root@ppasbart tmp]# ls -la /opt/backup/ppas94/1429795268774/
total 56364
drwx------. 2 bart bart       36 Apr 23 15:21 .
drwx------. 4 bart bart       46 Apr 23 15:21 ..
-rw-rw-r--. 1 bart bart       33 Apr 23 15:21 base.md5
-rw-rw-r--. 1 bart bart 57710592 Apr 23 15:21 base.tar
[root@ppasbart tmp]# ls -la /opt/backup/ppas94/archived_wals/
total 81928
drwx------. 2 bart bart     4096 Apr 23 15:21 .
drwx------. 4 bart bart       46 Apr 23 15:21 ..
-rw-------. 1 bart bart 16777216 Apr 23 15:10 000000010000000000000002
-rw-------. 1 bart bart 16777216 Apr 23 15:13 000000010000000000000003
-rw-------. 1 bart bart 16777216 Apr 23 15:20 000000010000000000000004
-rw-------. 1 bart bart 16777216 Apr 23 15:21 000000010000000000000005
-rw-------. 1 bart bart 16777216 Apr 23 15:21 000000010000000000000006
-rw-------. 1 bart bart      304 Apr 23 15:21 000000010000000000000006.00000028.backup

Use the SHOW-BACKUPS switch to get on overview of the backups available:

[bart@ppasbart ~]$ /usr/edb-bart-1.0/bin/bart -c /usr/edb-bart-1.0/etc/bart.cfg SHOW-BACKUPS 
 Server Name   Backup ID       Backup Time           Backup Size  
                                                                  
 ppas94        1429795268774   2015-04-23 15:21:23   55.0371 MB   
 ppas94        1429795515326   2015-04-23 15:25:18   5.72567 MB   
 ppas94        1429795614916   2015-04-23 15:26:58   5.72567 MB   
                                                                  

A backup without a restore proves nothing so lets try to restore one of the backups to the ppas server to a different directory:

[root@ppas 9.4AS]# mkdir /opt/PostgresPlus/9.4AS/data2
[root@ppas 9.4AS]# chown enterprisedb:enterprisedb /opt/PostgresPlus/9.4AS/data2

On the ppasbart host do the restore:

[bart@ppasbart ~]$ /usr/edb-bart-1.0/bin/bart -c /usr/edb-bart-1.0/etc/bart.cfg RESTORE -s PPAS94 -i 1429795614916 -r enterprisedb@ppas -p /opt/PostgresPlus/9.4AS/data2
INFO:  restoring backup '1429795614916' of server 'ppas94'
INFO:  restoring backup to enterprisedb@ppas:/opt/PostgresPlus/9.4AS/data2
INFO:  base backup restored
INFO:  archiving is disabled
INFO:  backup restored successfully at enterprisedb@ppas:/opt/PostgresPlus/9.4AS/data2

Looks good. Lets see what is in the data2 directory on the ppas host:

[root@ppas 9.4AS]# ls /opt/PostgresPlus/9.4AS/data2
backup_label  dbms_pipe  pg_clog      pg_hba.conf    pg_log      pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  postgresql.auto.conf
base          global     pg_dynshmem  pg_ident.conf  pg_logical  pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_xlog     postgresql.conf
[root@ppas 9.4AS]# ls /opt/PostgresPlus/9.4AS/data2/pg_xlog
000000010000000000000008  archive_status

Looks good, too. As this is all on the same server we need to change the port before bringing up the database:

-bash-4.2$ grep port postgresql.conf  | head  -1
port = 5445				# (change requires restart)
-bash-4.2$ pg_ctl start -D data2/
server starting
-bash-4.2$ 2015-04-23 16:01:30 CEST FATAL:  data directory "/opt/PostgresPlus/9.4AS/data2" has group or world access
2015-04-23 16:01:30 CEST DETAIL:  Permissions should be u=rwx (0700).

Ok, fine. Change it:

-bash-4.2$ chmod 700 /opt/PostgresPlus/9.4AS/data2
-bash-4.2$ pg_ctl start -D data2/
server starting
-bash-4.2$ 2015-04-23 16:02:00 CEST LOG:  redirecting log output to logging collector process
2015-04-23 16:02:00 CEST HINT:  Future log output will appear in directory "pg_log".

Seems ok, lets connect:

-bash-4.2$ psql -p 5445 -U bart
Password for user bart: 
psql.bin (9.4.1.3)
Type "help" for help.

edb=> l
                                           List of databases
   Name    |    Owner     | Encoding |   Collate   |    Ctype    | ICU |       Access privileges       
-----------+--------------+----------+-------------+-------------+-----+-------------------------------
 edb       | enterprisedb | UTF8     | en_US.UTF-8 | en_US.UTF-8 |     | 
 postgres  | enterprisedb | UTF8     | en_US.UTF-8 | en_US.UTF-8 |     | 
 template0 | enterprisedb | UTF8     | en_US.UTF-8 | en_US.UTF-8 |     | =c/enterprisedb              +
           |              |          |             |             |     | enterprisedb=CTc/enterprisedb
 template1 | enterprisedb | UTF8     | en_US.UTF-8 | en_US.UTF-8 |     | =c/enterprisedb              +
           |              |          |             |             |     | enterprisedb=CTc/enterprisedb
(4 rows)

Cool. Works. But: archiving is disabled and you'll need to enable it again. This is the default behavior of bart as it adds "archive_mode=off" to the end of the postgressql.conf. But take care that you adjust the archive_command parameter as all archived wals will be scp'ed to the same directory on the ppasbart server as the original database did. Can we do a point in time recovery? Let's try (I'll destroy the restored database cluster and will use the same data2 directory ):

-bash-4.2$ pg_ctl -D data2 stop -m fast
waiting for server to shut down.... done
server stopped
-bash-4.2$ rm -rf data2/*
-bash-4.2$ 

Lets try the restore to a specific point in time:

[bart@ppasbart ~]$ /usr/edb-bart-1.0/bin/bart -c /usr/edb-bart-1.0/etc/bart.cfg RESTORE -s PPAS94 -i 1429795614916 -r enterprisedb@ppas -p /opt/PostgresPlus/9.4AS/data2 -g '2015-04-03 15:23:00'
INFO:  restoring backup '1429795614916' of server 'ppas94'
INFO:  restoring backup to enterprisedb@ppas:/opt/PostgresPlus/9.4AS/data2
INFO:  base backup restored
INFO:  creating recovery.conf file
INFO:  archiving is disabled
INFO:  backup restored successfully at enterprisedb@ppas:/opt/PostgresPlus/9.4AS/data2

Seems ok, but what is the difference? When specifying a point in time a recovery.conf file will be created for the restored database cluster:

-bash-4.2$ cat data2/recovery.conf
restore_command = 'scp -o BatchMode=yes -o PasswordAuthentication=no bart@192.168.56.245:/opt/backup/ppas94/archived_wals/%f %p'
recovery_target_time = '2015-04-03 15:23:00'

Lets start the database (after changing the port again in postgresql.conf):

-bash-4.2$ pg_ctl -D data2 start
server starting
-bash-4.2$ 2015-04-23 16:16:12 CEST LOG:  redirecting log output to logging collector process
2015-04-23 16:16:12 CEST HINT:  Future log output will appear in directory "pg_log".

Are we able to connect?

-bash-4.2$ psql -U bart -p 5445 
Password for user bart: 
psql.bin (9.4.1.3)
Type "help" for help.

edb=>

Works, too. So now we have a central backup server for our postgresql infrastructure from which backups and restores can be executed. Combine this with a backup software (like netbackup, etc) which picks up the backups from the bartserver and you should be fine. in the next post we'll setup a hot standby database server.

A migration pitfall with ALL COLUMN SIZE AUTO

Wed, 2015-04-29 13:05

When you migrate, you should be prepared to face some execution plan changing. That's not new. But here I'll show you a case where you have several bad execution plans because lot of histograms are missing. The version is the same. The system is the same. You've migrated with DataPump importing all statistics. You have the same automatic job to gather statistics with all default options. You have repeated the migration several times on a system where you constantly reproduce the load. Have done a lot of regression tests. Everything was ok.