Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 15 hours 25 min ago

Documentum – DFC traces setup & investigation

Sat, 2017-11-25 12:11

When working with Documentum, you will most probably have to enable the DFC traces one day or another and then work with these traces to analyze them. The purpose of this blog is simply to show how the DFC traces can be enabled, which tools can be used to quickly process them and what are the limitations of such things.

Enabling the DFC traces can be done very easily by updating the dfc.properties file of the client. This client can be a DA, D2, JMS, Index Agent, aso… The change is applied directly (if enabled=true) and disabled by default (if commented or enable=false). If you have a dfc.properties that is inside a war file (for DA/D2 for example) and that you deployed your application as a war file (not exploded), then disabling the tracing might need a restart of your application. To avoid that, you can have a dfc.properties inside the war file that just point to another one outside of the war file and then enabling/disabling the traces from this second file will work properly. There are a lot of options to customize how the traces should be generated. A first example with only a few properties that you can use and reuse every time you need traces:

dfc.tracing.enable=true
dfc.tracing.verbose=true
dfc.tracing.max_stack_depth=0
dfc.tracing.mode=compact
dfc.tracing.dir=/tmp/dfc_tracing

 

Another example with more properties to really specify what you want to see:

dfc.tracing.enable=true
dfc.tracing.verbose=true
dfc.tracing.max_stack_depth=4
dfc.tracing.include_rpcs=true
dfc.tracing.mode=standard
dfc.tracing.include_session_id=true
dfc.tracing.user_name_filter[0]=dmadmin
dfc.tracing.user_name_filter[1]=myuser
dfc.tracing.thread_name_filter[0]=Thread-3
dfc.tracing.thread_name_filter[1]=Thread-25
dfc.tracing.timing_style=milliseconds_from_start
dfc.tracing.dir=/tmp/dfc_tracing
dfc.tracing.file_prefix=mydfc
dfc.tracing.max_backup_index=10
dfc.tracing.max_file_size=104857600
...

 

All these properties are quite easy to understand even without explanation but you can probably find more information and all the possible options in the official Documentum documentation. It’s not the main purpose of this blog so I’m just mentioning a few properties to get started. By default, the name of the generated files will be something like “dfctrace.timestamp.log”, you can change that by setting the “dfc.tracing.file_prefix” for example. Adding and customizing the properties will change the display format and style inside the files so if you want to have a way to analyze these DFC traces, it is better to use more or less always the same set of options. For the example below, OTX asked me to use these properties only:

dfc.tracing.enable=true
dfc.tracing.verbose=true
dfc.tracing.max_stack_depth=4
dfc.tracing.include_rpcs=true
dfc.tracing.mode=compact
dfc.tracing.include_session_id=true
dfc.tracing.dir=/tmp/dfc_tracing

 

When you have your DFC traces, you need a way to analyze them. They are pretty much readable but it will be complicated to get something out of it without spending a certain amount of time – unless you already know what you are looking for – simply because there are a lot of information inside… For that purpose, Ed Bueche developed more than 10 years ago some AWK scripts to parse the DFC traces files: traceD6.awk and trace_rpc_histD6.awk. You can find these scripts at the following locations (all EMC links… So might not be working at some point in the future):

As you can see above, it is not really maintained and the same scripts or a mix of several versions can be found at several locations so it can be a little bit confusing. All the old links are about the awk scripts but since 2013, there is now a python script too (also developed by Ed Bueche).

In this blog, I wanted to talk about the AWK scripts mainly. Earlier this month, I was working with OTX on some performance tuning tasks and for that, I gathered the DFC traces for several scenarios, in different log files, well separated, aso… Then, I provided them to OTX for the analysis. OTX came back to me a few minutes later saying that most of the traces were corrupted and asking me to regenerate them. I wasn’t quite OK with that simply because it takes time and because there were some testing in progress on this environment so gathering clean DFC traces for several scenarios would have forced the tests to be stopped, aso… (Ok ok you got me, I’m just lazy ;))

The content of the DFC traces looked correct to me and after a quick verification, I saw that OTX was using the AWK scripts (traceD6.awk and trace_rpc_histD6.awk) to analyze the logs but they were apparently getting an error. The files didn’t look corrupted to me so I mentioned to OTX that the issue might very well be with the AWK scripts they were using. They didn’t really listen to what I said and stayed focus on getting a new set of DFC traces. I already used these scripts but never really looked inside so it was the perfect reason to take some time for that:

[dmadmin@content_server_01 ~]$ cd /tmp/dfc_tracing/
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ ls -l trace* dfctrace.*.log
-rw-r-----. 1 dmadmin dmadmin 92661060 Nov 3 09:24 dfctrace.1510220481672.log
-rw-r-----. 1 dmadmin dmadmin 3240 Nov 4 14:10 traceD6.awk
-rw-r-----. 1 dmadmin dmadmin 7379 Nov 4 14:10 traceD6.py
-rw-r-----. 1 dmadmin dmadmin 5191 Nov 4 14:10 trace_rpc_histD6.awk
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f traceD6.awk < dfctrace.1510220481672.log > output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_awk_1.log
2 output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f trace_rpc_histD6.awk < dfctrace.1510220481672.log > output_awk_2.log
awk: trace_rpc_histD6.awk:203: (FILENAME=- FNR=428309) fatal: division by zero attempted
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_awk_2.log
4 output_awk_2.log
[dmadmin@content_server_01 dfc_tracing]$

 

As you can see above, the first script generated a log file that contains only 2 lines, so this is already suspicious even if there are no errors. The second script generated an error and its log file contains only 4 lines… The input DFC trace file has a size of 90Mb so it’s clear that there is something wrong and that’s why OTX said that the DFC traces were corrupted. The error message shows the line (203) as the origin of the issue as well as a “division by zero attempted” message. This obviously means that somewhere on this line, there is a division and that the divisor is equal to 0 or at least not set at all. Since I love all kind of UNIX scripting, I would rather fix the bug in the script than having to generate a new set of DFC traces (and the new set would still be impacted by the issue anyway…)! So checking inside the trace_rpc_histD6.awk file, the line 203 is the following one:

[dmadmin@content_server_01 dfc_tracing]$ grep -n -C1 "TIME SPENT" trace_rpc_histD6.awk
202-    printf ("DURATION (secs):\t%17.3f\n", ((curr_tms - st_tms)) );
203:    printf ("TIME SPENT EXECUTING RPCs (secs):%8.3f (which is %3.2f percent of total time)\n", total_rpc_time, 100*total_rpc_time/(curr_tms - st_tms));
204-    printf ("Threads :\t%25d\n", thread_cnt);
[dmadmin@content_server_01 dfc_tracing]$

 

The only division on this line is the total time taken to execute the RPCs divided by the duration of the log file (timestamp of last message – first message). So the value of “curr_tms – st_tms” is 0. Potentially, it could be that both variables have the exact same value but since the first and last messages on the DFC traces don’t have the same timestamp, this isn’t possible and therefore both variables are actually 0 or not set. To check where these variables are defined, how and in which function:

[dmadmin@content_server_01 dfc_tracing]$ grep -n -C15 -E "curr_tms|st_tms" trace_rpc_histD6.awk | grep -E "curr_tms|st_tms|^[0-9]*[:-][^[:space:]]"
144-/ .RPC:/ {
159:                    st_tms = $1;
162:            curr_tms = $1;
175-}
177-/obtained from pool/ {
--
187-}
188-/.INFO: Session/ {
193-}
197-END {
202:    printf ("DURATION (secs):\t%17.3f\n", ((curr_tms - st_tms)) );
203:    printf ("TIME SPENT EXECUTING RPCs (secs):%8.3f (which is %3.2f percent of total time)\n", total_rpc_time, 100*total_rpc_time/(curr_tms - st_tms));
[dmadmin@content_server_01 dfc_tracing]$

 

This shows that the only location where these two variables are set is inside the matching pattern “/ .RPC:/” (st_tms is set to $1 only on the first execution). So it means that this portion of code is never executed so in other words: this pattern is never found in the DFC trace file. Why is that? Well that’s pretty simple: the DFC traces file contains a lot of RPC calls but these lines never contain ” .RPC:”, there are always at least two dots (so something like that: ” ..RPC:” or ” …RPC:” or ” ….RPC:”). The reason why there are several dots is simply because the RPC are placed where they are called… In this case, OTX asked us to use “dfc.tracing.max_stack_depth=4″ so this is what I did and it is the reason why the AWK scripts cannot work by default because they need “dfc.tracing.max_stack_depth=0″, that’s written at the beginning of the scripts in the comment sections.

So a simple way to fix the AWK scripts is to remove the space at the beginning of the pattern for both the traceD6.awk and trace_rpc_histD6.awk scripts and after doing that, it will work for all max_stack_depth values:

[dmadmin@content_server_01 dfc_tracing]$ grep -n ".RPC:/" *.awk
traceD6.awk:145:/ .RPC:/ {
trace_rpc_histD6.awk:144:/ .RPC:/ {
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ sed -i 's,/ .RPC:/,/.RPC:/,' *.awk
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ grep -n ".RPC:/" *.awk
traceD6.awk:145:/.RPC:/ {
trace_rpc_histD6.awk:144:/.RPC:/ {
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f traceD6.awk < dfctrace.1510220481672.log > output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -lc output_awk_1.log
 1961 163788 output_awk_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ awk -f trace_rpc_histD6.awk < dfctrace.1510220481672.log > output_awk_2.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_awk_2.log
 367 49050 output_awk_2.log
[dmadmin@content_server_01 dfc_tracing]$

 

That looks much much better… Basically, the first script list all RPCs with their thread, name and times while the second script creates a sorted list of queries that took the most time to execute as well as a list of calls and occurrences per types/names.

The AWK and Python scripts, even if they are globally working, might have some issues with commas, parenthesis and stuff like that (again it depends which dfc.tracing options you selected). This is why I mentioned above that there is actually both a AWK and Python version of these scripts. Sometimes, the AWK scripts will contain the right information, sometimes it is the Python version that will but in all cases, the later will run much faster. So if you want to work with these scripts, you will have to juggle a little bit:

[dmadmin@content_server_01 dfc_tracing]$ python traceD6.py dfctrace.1510220481672.log > output_py_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_py_1.log
 1959 194011 output_py_1.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ python traceD6.py dfctrace.1510220481672.log -profile > output_py_2.log
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ wc -l output_py_2.log
 342 65917 output_py_2.log
[dmadmin@content_server_01 dfc_tracing]$

 

As you can see, there are fewer lines in the python output files but that’s because some unnecessary headers have been removed in the python version so it’s actually normal. However there are much more characters so it shows that, in this case, the extracted DQL queries contain more characters but it does not mean that these characters are actually part of the DQL queries: you will see below that there are references to “, FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)” => This is NOT part of the DQL but it is present on the output of the Python script while it is not for the AWK one:

[dmadmin@content_server_01 dfc_tracing]$ head -15 output_awk_1.log
analysis program version 2 based on DFC build 6.0.0.76
68354.130 & 0.005 & [http--0.0.0.0-9082-3] & EXEC_QUERY  select r_object_id from dm_sysobject where folder ('/Home') and object_name = 'Morgan Patou'
68354.135 & 0.000 & [http--0.0.0.0-9082-3] & multiNext
68354.136 & 0.005 & [http--0.0.0.0-9082-3] & SysObjFullFetch  0b0f12345004f0de
68354.165 & 0.002 & [http--0.0.0.0-9082-4] & EXEC_QUERY  select r_object_id from dm_user where user_name = 'Morgan Patou'
68354.167 & 0.000 & [http--0.0.0.0-9082-4] & multiNext
68354.167 & 0.002 & [http--0.0.0.0-9082-4] & IsCurrent
68354.170 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY  SELECT COUNT(*) AS items FROM dm_group WHERE group_name = 'report_user' AND ANY i_all_users_names = 'Morgan Patou'
68354.173 & 0.001 & [http--0.0.0.0-9082-4] & multiNext
68354.175 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY  select r_object_id from dm_sysobject where folder ('/myInsight') and object_name = 'myInsight.license'
68354.178 & 0.001 & [http--0.0.0.0-9082-4] & multiNext
68354.179 & 0.001 & [http--0.0.0.0-9082-4] & IsCurrent
68354.165 & 0.010 & [http--0.0.0.0-9082-3] & SysObjGetPermit
68354.175 & 0.006 & [http--0.0.0.0-9082-3] & SysObjGetXPermit
68354.181 & 0.006 & [http--0.0.0.0-9082-4] & MAKE_PULLER
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ head -15 output_py_1.log
68354.130 & 0.005 & [http--0.0.0.0-9082-3] & EXEC_QUERY & select r_object_id from dm_sysobject where folder ('/Home') and object_name = 'Morgan Patou', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
68354.135 & 0.000 & [http--0.0.0.0-9082-3] & multiNext &
68354.136 & 0.005 & [http--0.0.0.0-9082-3] & SysObjFullFetch & 0b0f12345004f0de
68354.165 & 0.002 & [http--0.0.0.0-9082-4] & EXEC_QUERY & select r_object_id from dm_user where user_name = 'Morgan Patou', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
68354.167 & 0.000 & [http--0.0.0.0-9082-4] & multiNext &
68354.167 & 0.002 & [http--0.0.0.0-9082-4] & IsCurrent & 110f123450001d07
68354.170 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY & SELECT COUNT(*) AS items FROM dm_group WHERE group_name = 'report_user' AND ANY i_all_users_names = 'Morgan Patou', FOR_UPDATE=T, BATCH_HINT=50, BOF_DQL=T, FLUSH_BATCH=-1]],50,true,true)
68354.173 & 0.001 & [http--0.0.0.0-9082-4] & multiNext &
68354.175 & 0.003 & [http--0.0.0.0-9082-4] & EXEC_QUERY & select r_object_id from dm_sysobject where folder ('/myInsight') and object_name = 'myInsight.license', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
68354.178 & 0.001 & [http--0.0.0.0-9082-4] & multiNext &
68354.179 & 0.001 & [http--0.0.0.0-9082-4] & IsCurrent & 090f123450023f63
68354.165 & 0.010 & [http--0.0.0.0-9082-3] & SysObjGetPermit & 0b0f12345004f0de
68354.175 & 0.006 & [http--0.0.0.0-9082-3] & SysObjGetXPermit & 0b0f12345004f0de
68354.181 & 0.006 & [http--0.0.0.0-9082-4] & MAKE_PULLER & null
68354.187 & 0.000 & [http--0.0.0.0-9082-4] & getBlock &
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$
[dmadmin@content_server_01 dfc_tracing]$ head -35 output_py_2.log

****** PROFILE OF rpc CALLS *****
     3.273           0.080              41      AUTHENTICATE_USER
     0.032           0.002              17      BEGIN_TRANS
     0.001           0.000              14      END_PUSH_V2
     0.202           0.012              17      END_TRANS
    21.898           0.071             310      EXEC_QUERY
     0.028           0.005               6      FETCH_CONTENT
     0.011           0.000              55      GET_ERRORS
     0.117           0.004              27      GET_LOGIN
     0.290           0.002             163      IsCurrent
     0.013           0.000              82      KILL_PULLER
     0.003           0.000              14      KILL_PUSHER
     0.991           0.012              82      MAKE_PULLER
     0.005           0.000              14      MAKE_PUSHER
     0.002           0.000               5      NEXT_ID_LIST
     0.083           0.002              38      NORPC
     0.015           0.005               3      RelationCopy
     0.446           0.032              14      SAVE
     0.274           0.014              20      SAVE_CONT_ATTRS
     0.140           0.010              14      START_PUSH
     0.134           0.045               3      SysObjCheckin
     0.048           0.016               3      SysObjCheckout
     2.199           0.009             240      SysObjFullFetch
     0.913           0.006             141      SysObjGetPermit
     0.764           0.005             141      SysObjGetXPermit
     0.642           0.046              14      SysObjSave
     0.033           0.000              82      getBlock
     1.454           0.004             399      multiNext

**** QUERY RESPONSE SORTED IN DESCENDING ORDER ****

10.317  select distinct wf.object_name as workflow_name, pr.object_name as process_name, i.name as Performer_Name, i.task_name as Task_Name, i.date_sent as Date_Task_Sent, i.actual_start_date as Date_Task_Acquired, wf.r_creator_name as Workflow_Initiator, cd.primary_group as "group", cd.subgroup as subgroup, cd.artifact_name as Artifact_Name, cd.object_name as document_name, cd.r_version_label as version_label, cd.title as Document_Title, cd.r_object_id as object_id from cd_common_ref_model(all) cd, dmi_package p, dmi_queue_item i, dm_workflow wf, dm_process pr
0.607   select r_object_id from dm_sysobject where folder ('/myInsight/Presentations/Standard Presentations/Graphical Reports') and object_name = 'FusionInterface.xsl', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
0.505   select r_object_id from dm_sysobject where folder ('/myInsight/Presentations/Life Sciences') and object_name = 'Unique Templates.xsl', FOR_UPDATE=F, BATCH_HINT=50, BOF_DQL=T]],50,true,true)
[dmadmin@content_server_01 dfc_tracing]$

 

To conclude this blog on a more philosophical note: always question what other people ask you to do and think twice before doing the same thing over and over again. ;)

 

 

Cet article Documentum – DFC traces setup & investigation est apparu en premier sur Blog dbi services.

#DOAG2017

Fri, 2017-11-24 16:02

CaptureGPTW

The discussions about the technologies we love. With Bryn about my tests on the MLE and the fact that I compared very different things, running a recursive function on different datatype (integer vs. number). With Mike about the way RUs will be recommended and RURs only for very special cases. With Nigel about the ODC Database Ideas, with Stefan about what is documented or not, with… Discussions about community also, and user groups.

The trip, where meeting fellow speakers start in the plane,…

The dinners with ACEs, with Speakers, with friends…

The beers, thanks to the Pieter & Philippe for sharing Belgian beers & cheese & mustard & celery salt & your good mood

The sessions of course. Kamil’s tool to show tablespace fragmentation visually, Jan’s comparison between Oracle and EDB, Philippe & Pieter technical view on GDPR, Adam’s research on NFS for his appliance,…

The party for sure,…

DSC00332My session, and the very interesting questions I got… I was lucky to speak on the first day. And proud to speak on the Oak Table stream for the first time. I was happy to see many people already with a CDB and even in production. It is a slow adoption but people come to it and finally notice that it is not a big change for daily job.

IMG_4712And colleagues of course. This is the conference where dbi services has a booth and several speakers. We are passionate and like to share. At the booth, we did some demos of Dbvisit Standby 8, Orachrome Lighty, and also the OpenDB Appliance. We meet customers, or candidatees, talk about the technologies we love, explain how we do our training workshops. It is also a great place to discuss among us. Even if we have internal projects, and two ‘dbi xChange’ events every year, we are mainly at customers and have so much to share.

DOAG is an amazing conference. Intense time compressed into 3 days. This incredibly friendly ambiance is hard to quit at the end of the conference. Fortunately, persistence and durability are guaranteed thanks to Kamil’s snapshots:

Some of the speakers at #DOAG2017 party – @MDWidlake @BrynLite @ChandlerDBA @FranckPachot @RoelH @pioro @boliniak @chrisrsaxon @phurley @kmensah @lleturgez @DBAKevlar @oraesque @MikeDietrichDE @OracleSK – it was fun :) pic.twitter.com/Oe2l26QxSp

— Kamil Stawiarski (@ora600pl) November 23, 2017

#DOAG2017 speakers dinner was awesome! pic.twitter.com/cSsUaf6VPB

— Kamil Stawiarski (@ora600pl) November 22, 2017

When you see how Kamil highlights each personality with a simple camera, can you imagine what he can do when organizing a conference? Keep an eye on POUG website.

 

Cet article #DOAG2017 est apparu en premier sur Blog dbi services.

DOAG 2017: avg_row_len with virtual columns

Fri, 2017-11-24 11:47

At the DOAG I attended a session “Top-level DB design for Big Data in ATLAS Experiment at CERN” provided by Gancho Dimitrov. The presentation was actually very interesting. As part of Gancho’s improvement activities to reduce space in a table he stored data in a 16 Bytes raw format (instead of a string representing hex values which requires 36 Bytes) and use virtual columns to actually calculate the real hex-string.

So the original value is e.g. 21EC2020-3AEA-4069-A2DD-08002B30309D, which is reduced to 16 Bytes by removing the ‘-‘ and converting the resulting hex-string to raw:

HEXTORAW(REPLACE(’21EC2020-3AEA-4069-A2DD-08002B30309D’, ‘-‘, ”))

The problem was that the longer virtual columns added to the average row length statistic in Oracle. I.e. here the simple testcase:


create table cern_test
(
GUID0 RAW(16)
,GUID1 RAW(16)
,GUID2 RAW(16)
,GUID0_CHAR as (SUBSTR(RAWTOHEX(GUID0),1,8)||'-'||
SUBSTR(RAWTOHEX(GUID0),9,4)||'-'||
SUBSTR(RAWTOHEX(GUID0),13,4)||'-'||
SUBSTR(RAWTOHEX(GUID0),17,4)||'-'||
SUBSTR(RAWTOHEX(GUID0),21,12))
,GUID1_CHAR as (SUBSTR(RAWTOHEX(GUID1),1,8)||'-'||
SUBSTR(RAWTOHEX(GUID1),9,4)||'-'||
SUBSTR(RAWTOHEX(GUID1),13,4)||'-'||
SUBSTR(RAWTOHEX(GUID1),17,4)||'-'||
SUBSTR(RAWTOHEX(GUID1),21,12))
,GUID2_CHAR as (SUBSTR(RAWTOHEX(GUID2),1,8)||'-'||
SUBSTR(RAWTOHEX(GUID2),9,4)||'-'||
SUBSTR(RAWTOHEX(GUID2),13,4)||'-'||
SUBSTR(RAWTOHEX(GUID2),17,4)||'-'||
SUBSTR(RAWTOHEX(GUID2),21,12))
);
 
insert into cern_test (guid0,guid1,guid2)
select HEXTORAW('21EC20203AEA4069A2DD08002B30309D'),
HEXTORAW('31DC20203AEA4069A2DD08002B30309D'),
HEXTORAW('41CC20203AEA4069A2DD08002B30309D')
from xmltable('1 to 10000');
commit;
 
exec dbms_stats.gather_table_stats(user,'CERN_TEST',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 1');
 
select avg_row_len from tabs where table_name='CERN_TEST';
 
AVG_ROW_LEN
-----------
162
 
select sum(avg_col_len) from user_tab_columns
where table_name='CERN_TEST' and column_name in ('GUID0','GUID1','GUID2');
 
SUM(AVG_COL_LEN)
----------------
51
 
select sum(avg_col_len) from user_tab_columns
where table_name='CERN_TEST' and column_name in ('GUID0_CHAR','GUID1_CHAR','GUID2_CHAR');
 
SUM(AVG_COL_LEN)
----------------
111

The question is if the computation of the average row length by Oracle is correct. I.e. should the physically non-existent virtual columns be considered?
I.e. physically they do not take space. So physically the average row length in the example above is 51, but logically it is 162. What is correct?

To answer that question it has to be checked what the average row length is used for. That information is not documented, but my assumption is that it’s actually only used for the calculation of Bytes required when doing a “select * “, i.e. all columns. That number however, may become important later on when calculating the memory required for e.g. a hash join.

Anyway, the basic question is how Oracle treats virtual columns in execution plans? I.e. does it compute the value of the virtual column when the table is accessed or does it compute the virtual column when it needs it (e.g. when fetching the row or when needing it as a column to join with). According the number “Bytes” in the execution plan the value is computed when the table is accessed:


SQL> explain plan for
2 select a.*, b.guid0 b_guid0 from cern_test a, cern_test b
3 where a.guid0_char=b.guid0_char;
 
Explained.
 
SQL> select * from table(dbms_xplan.display(format=>'+PROJECTION'));
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
Plan hash value: 3506643611
 
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100M| 20G| 267 (83)| 00:00:01 |
|* 1 | HASH JOIN | | 100M| 20G| 267 (83)| 00:00:01 |
| 2 | TABLE ACCESS FULL| CERN_TEST | 10000 | 527K| 23 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| CERN_TEST | 10000 | 1582K| 23 (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
1 - access("A"."GUID0_CHAR"="B"."GUID0_CHAR")
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - (#keys=1) "A"."GUID0_CHAR"[VARCHAR2,132], "GUID0"[RAW,16],
"GUID0"[RAW,16], "GUID1"[RAW,16], "GUID2"[RAW,16] 2 - "GUID0"[RAW,16] 3 - "GUID0"[RAW,16], "GUID1"[RAW,16], "GUID2"[RAW,16]  
23 rows selected.
 
SQL> select column_name, avg_col_len from user_tab_columns
2 where table_name='CERN_TEST' and column_name in ('GUID0','GUID0_CHAR');
 
COLUMN_NAME AVG_COL_LEN
----------------------------------- -----------
GUID0 17
GUID0_CHAR 37
 
SQL> select (10000*17)/1024 from dual;
 
(10000*17)/1024
---------------
166.015625
 
SQL> select ((10000*17)+(10000*37))/1024 from dual;
 
((10000*17)+(10000*37))/1024
----------------------------
527.34375

So according the projection at step 2 of the plan we use B.GUID0 only, but the Bytes value of 527K considers GUID0 and the virtual column GUID0_CHAR. So the calculation of Bytes is done when the table is accessed and not when the virtual column is actually needed (during the hash).

In that regard the calculation of the avg_row_len by dbms_stats with the virtual columns considered is correct.

The only issue I see are old scripts people wrote long ago, which try to compute the amount of data in a table based on its avg_row_len statistic using something like


SELECT table_name, num_rows * avg_row_len actual_size_of_data
FROM user_tables order by 2;

If there are virtual columns in the table, such a select may return too high values for “actual_size_of_data”.

REMARK: Using the old ANALYZE command to gather statistics results in a value for the avg_row_len, which considers only “real” columns. However, ANALYZE must not be used anymore of course.

 

Cet article DOAG 2017: avg_row_len with virtual columns est apparu en premier sur Blog dbi services.

DOAG 2017

Thu, 2017-11-23 08:00

Als Consultant bei dbi services war ich die letzten 2 Jahre hauptsächlich in Konsolidierungsprojekten basierend auf den Oracle Engineered Systems unterwegs. Deshalb waren für mich natürlich die Vorträge über die neue Generation der Oracle Database Appliance X7-2 interessant.

Aus meiner Sicht hat Oracle den richtigen Schritt getan und die Vielfalt der ODA Systeme wie es sie noch in der X6-2 Generation gab (Small/Medium/Large und HA) reduziert.

Künftig wird es die ODA X7-2 nur noch in 3 Modellen (S, M und HA) geben, wobei klar zu sagen ist, dass die kleineren Systeme leistungstechnisch aufgewertet wurden und die HA endlich wieder in einer Ausstattung zur Verfügung steht, die eine Konsolidierung auch von größeren Datenbank- und Applikationssystemen ermöglicht:

  • die ODA X7-2 S als Einsteigersystem mit einer 10 Core CPU, bis zu 384 GB RAM und 12,8 TB NVMe Storage
  • die ODA X7-2 M entspricht nun eher den X6 L Systemen mit 2×18 Core, bis zu 768 GB RAM und bis 51,2 TB NVMe Storage
  • die ODA X7-2 HA ist natürlich das Flaggschiff der ODA Klasse. 2 Server mit je 2×18 Cores, bis zu 768 GB RAM pro Server und diversen Storage Erweiterungen bis 150 TB geben einem das alte X5 Gefühl (oder vielleicht schon mit einer Exadata) zu arbeiten

Die für mich interessantesten Neuerungen sind weniger im Hardwarebereich zu finden, sondern viel mehr in den möglichen Deployments der Systeme:

  • alle Systeme unterstützen SE/SE1/SE2/EE 11.2.0.4, 12.1.0.2 und 12.2.01
  • alle Systeme unterstützen ein virtualisiertes Setup, die kleinen Systeme mit KVM, die X7-2 HA mit KVM und OVM, wobei bis jetzt noch kein Hardpartitioning mit KVM möglich, aber in Planung ist
  • auf der X7-2 HA kann bei den Storage Erweiterungen zwischen High Performance (SSD) und High Capacity (HDD) gewählt werden, sogar Mischformen sind mit Einschränkungen möglich

Eingespart wurde allerdings bei den Netzwerkschnittstellen, hier gibt es nur noch 2 Interfaces statt bisher 4 wie an der X5-2 (neben den privaten Interfaces für den Interconnect). Es gibt zwar die Möglichkeit ein weiteres Interface nach dem Deployment zu konfigurieren, allerdings nur mit 1GB. Geplant ist zwar, dass künftig auch im Bare Metal Setup VLAN Konfigurationen auf dem Public Interface möglich sind, trotzdem hätte man (insbesondere der HA) noch zwei zusätzliche Interfaces spendieren können, zudem Steckplätze vorhanden sind.

Hoch interessant sind die Leistungsdaten des NVMe bzw. SSD Storage. Hier sind bis zu 100 000 IOPS möglich, bei der HA sehe ich als begrenzenden Faktor eher den SAS Bus als die SDDs. Was wirklich schön ist, dass der Storage für die Redo Logs auf 4x800GB SSD erweitert wurde, hier musste man in den früheren Systemen immer etwas sparsam sein…

Alles in allem freue ich mich darauf mit der X7-2 zu arbeiten, denn Oracle stellt hier ein gutes Stück Hardware bereit, das auch preislich im Rahmen bleibt.

 

 

 

 

 

 

 

 

Cet article DOAG 2017 est apparu en premier sur Blog dbi services.

Create index CONCURRENTLY in PostgreSQL

Wed, 2017-11-22 12:10

In PostgreSQL when you create an index on a table, sessions that want to write to the table must wait until the index build completed by default. There is a way around that, though, and in this post we’ll look at how you can avoid that.

As usual we’ll start with a little table:

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 ( a int, b varchar(50));
insert into t1
select a.*, md5(a::varchar) from generate_series(1,5000000) a;
postgres=# \i a.sql
DROP TABLE
CREATE TABLE
INSERT 0 5000000

When you now create an index on that table and try to write the table at the same time from a different session that session will wait until the index is there (the screenshot shows the first session creating the index on the left and the second session doing the update on the right, which is waiting for the left one):
Selection_007

For production environments this not something you want to happen as this can block a lot of other sessions especially when the table in question is heavily used. You can avoid that by using “create index concurrently”.

Selection_008

Using that syntax writes to the table from other sessions will succeed while the index is being build. But, as clearly written in the documentation: The downside is that the table needs to be scanned twice, so more work needs to be done which means more resource usage on your server. Other points need to be considered as well. When, for whatever reason, you index build fails (e.g. by canceling the create index statement):

postgres=# create index concurrently i1 on t1(a);
^CCancel request sent
ERROR:  canceling statement due to user request

… you maybe would expect the index not to be there at all but this is not the case. When you try to create the index right after the canceled statement again you’ll hit this:

postgres=# create index concurrently i1 on t1(a);
ERROR:  relation "i1" already exists

This does not happen when you do not create the index concurrently:

postgres=# create index i1 on t1(a);
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=# create index i1 on t1(a);
CREATE INDEX
postgres=# 

The questions is why this happens in the concurrent case but not in the “normal” case? The reason is simple: When you create an index the “normal” way the whole build is done in one transaction. Because of this the index does not exist when the transaction is aborted (the create index statement is canceled). When you build the index concurrently there are multiple transactions involved: “In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions”. So in this case:

postgres=# create index concurrently i1 on t1(a);
ERROR:  relation "i1" already exists

… the index is already stored in the catalog:

postgres=# create index concurrently i1 on t1(a);
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=# select relname,relkind,relfilenode from pg_class where relname = 'i1';
 relname | relkind | relfilenode 
---------+---------+-------------
 i1      | i       |       32926
(1 row)

If you don’t take care of that you will have invalid indexes in your database:

postgres=# \d t1
                        Table "public.t1"
 Column |         Type          | Collation | Nullable | Default 
--------+-----------------------+-----------+----------+---------
 a      | integer               |           |          | 
 b      | character varying(50) |           |          | 
Indexes:
    "i1" btree (a) INVALID

You might think that this does not harm, but then consider this case:

-- in session one build a unique index
postgres=# create unique index concurrently i1 on t1(a);
-- then in session two violate the uniqueness after some seconds
postgres=# update t1 set a = 5 where a = 4000000;
UPDATE 1
-- the create index statement will fail in the first session
postgres=# create unique index concurrently i1 on t1(a);
ERROR:  duplicate key value violates unique constraint "i1"
DETAIL:  Key (a)=(5) already exists.

This is even worse as the index now really consumes space on disk:

postgres=# select relpages from pg_class where relname = 'i1';
 relpages 
----------
    13713
(1 row)

The index is invalid, of course and will not be used by the planner:

postgres=# \d t1
                        Table "public.t1"
 Column |         Type          | Collation | Nullable | Default 
--------+-----------------------+-----------+----------+---------
 a      | integer               |           |          | 
 b      | character varying(50) |           |          | 
Indexes:
    "i1" UNIQUE, btree (a) INVALID

postgres=# explain select * from t1 where a = 12345;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Gather  (cost=1000.00..82251.41 rows=1 width=37)
   Workers Planned: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..81251.31 rows=1 width=37)
         Filter: (a = 12345)
(4 rows)

But the index is still maintained:

postgres=# select relpages from pg_class where relname = 'i1';
 relpages 
----------
    13713
(1 row)
postgres=# insert into t1 select a.*, md5(a::varchar) from generate_series(5000001,6000000) a;
INSERT 0 1000000

postgres=# select relpages from pg_class where relname = 'i1';
 relpages 
----------
    16454
(1 row)

So now you have an index which can not be used to speed up queries (which is bad) but the index is still maintained when you write to the table (which is even worse because you consume resources for nothing). The only way out of this is to drop and re-create the index:

postgres=# drop index i1;
DROP INDEX
-- potentially clean up any rows that violate the constraint and then
postgres=# create unique index concurrently i1 on t1(a);
CREATE INDEX
postgres=# \d t1
                        Table "public.t1"
 Column |         Type          | Collation | Nullable | Default 
--------+-----------------------+-----------+----------+---------
 a      | integer               |           |          | 
 b      | character varying(50) |           |          | 
Indexes:
    "i1" UNIQUE, btree (a)

postgres=# explain select * from t1 where a = 12345;
                          QUERY PLAN                           
---------------------------------------------------------------
 Index Scan using i1 on t1  (cost=0.43..8.45 rows=1 width=122)
   Index Cond: (a = 12345)
(2 rows)

Remember: When a create index operations fails in concurrent mode make sure that you drop the index immediately.

One more thing to keep in mind: When you create an index concurrently and there is another session already modifying the data the create index command waits until that other operation completes:

-- first session inserts data without completing the transaction
postgres=# begin;
BEGIN
Time: 0.579 ms
postgres=# insert into t1 select a.*, md5(a::varchar) from generate_series(6000001,7000000) a;
INSERT 0 1000000
-- second sessions tries to build the index
postgres=# create unique index concurrently i1 on t1(a);

The create index operation will wait until that completes:

postgres=# select query,state,wait_event,wait_event_type from pg_stat_activity where state ='active';
                                query                                 | state  | wait_event | wait_event_t
----------------------------------------------------------------------+--------+------------+-------------
 create unique index concurrently i1 on t1(a);                        | active | virtualxid | Lock
 select query,state,wait_event,wait_event_type from pg_stat_activity; | active |            | 

… meaning when someone forgets to end the transaction the create index command will wait forever. There is the parameter idle_in_transaction_session_timeout which gives you more control on that but still you need to be aware what is happening here.

Happy index creation :)

 

Cet article Create index CONCURRENTLY in PostgreSQL est apparu en premier sur Blog dbi services.

DOAG2017 my impressions

Wed, 2017-11-22 11:28

As each year at end of November the biggest Oracle European conference takes place in Nürnberg, #DOAG2017. This year is a little bit special, because the DOAG celebrate the 30th edition of the conference.

2017_DOAG_Banner
dbi services is for the 5th time present with a booth and 8 sessions at the DOAG.IMG_3943
During the last 2 days I already followed many sessions, and I want to give you my impression and feedback’s about the market trends.
Tuesday morning as usual the conference started with a keynote, which is often not much interesting, because they only inform us, what was already communicated some weeks before at the Oracle Open Word conference. But this year it was not the case, I saw a very interesting session from Neil Sholay(Oracle) about technology and market shifts that will have an impact on our neat future. For example in the near future you running shoes will be directly made in the shop with a 3D printer, and your clothes will be directly made with a machine in the shop ,which is 17 time faster as clothes made by a men. 

After this nice introduction, I followed a very interesting session from Guido Schmutz(Trivadis) about Kafka with a very nice live demo, i like to see live demo but is something that I see less and less at the DOAG. At dbi services we try to have interesting live demo’s in each of our sessions.Later after a short break, I was very curious to see how many people will follow the session from Jan Karremans(EDB) about comparing Oracle to PostgreSQL, and as supposed the room was full. Therefore I can confirm the interest of seeing PostgreSQL sessions at the DOAG is very high. Because today most of the Oracle DBA beside their tasks, will also have to manage PostgreSQL databases.

IMG_3938
Today morning I followed a session from Mike Dietrich(Oracle)about the new Oracle database release model, as usual his session was very good with more hundred of participants.
The key word of the session, if you are still running Oracle database version 11.2.0.4, Mike advice to upgrade it very soon ! because begin of next year (5 weeks) you will  enter into the extended period with additional cost for the support.So last but not the least, this begin of afternoon I saw a session “Cloud provider battle” from Manfred Klimke(Trevisto). The interest for this session was also very high, because I suppose that most of the participants are not in the Cloud, and don’t know where they should go. During the session he presented a funny slide to resume the available Cloud service  with a pizza, and I can confirm it reflate the reality, “Dined at a restaurant” it the most expensive service.

pizza
As conclusion of this 2 days, all around Open Source is also a very important topic beside the Cloud at the DOAG, which also has presentations of the Oracle competitors.

 

Cet article DOAG2017 my impressions est apparu en premier sur Blog dbi services.

12c Multitenant Internals: compiling system package from PDB

Wed, 2017-11-22 07:38

DPKi1vxX0AAADLmWhen I explain the multitenant internals, I show that all metadata about system procedures and packages are stored only in CDB$ROOT and are accessed from the PDBs through metadata links. I take an example with DBMS_SYSTEM that has nothing in SOURCE$ of the PDB. But I show that we can compile it from the PDB. This is my way to prove that the session can access the system objects, internally switching the session to the root container when it needs to read SOURCE$. At DOAG Conference I had a very interesting question about what happens exactly in CDB$ROOT: Is the session really executing all the DML on the internal tables storing the compiled code of the procedure?

My first answer was something like ‘why not’ because the session in a PDB can switch and do modifications into CDB$ROOT internally. For example, even a local PDB DBA can change some ‘spfile’ parameters which are actually stored in the CDB$ROOT. But then I realized that the question goes further: is the PDB session really compiling the DBMS_SYSTEM package in the CDB$ROOT? Actually, there are some DDL that are transformed to ‘no-operation’ when executed on the PDB.

To see which ones are concerned, the best is to trace:

SQL> alter session set events='10046 trace name context forever, level 4';
Session altered.
SQL> alter session set container=PDB1;
Session altered.
SQL> alter package dbms_system compile;
Package altered.
SQL> alter session set events='10046 trace name context off';
Session altered.

I’ll not show the whole trace here. For sure I can see that the session switches to CDB$ROOT to read the source code of the package:

*** 2017-11-22T08:36:01.963680+01:00 (CDB$ROOT(1))
=====================
PARSING IN CURSOR #140650193204552 len=54 dep=1 uid=0 oct=3 lid=0 tim=5178881528 hv=696375357 ad='7bafeab8' sqlid='9gq78x8ns3q1x'
select source from source$ where obj#=:1 order by line
END OF STMT
PARSE #140650193204552:c=0,e=290,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=5178881527
EXEC #140650295606992:c=1000,e=287,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=813480514,tim=5178881999
FETCH #140650295606992:c=0,e=35,p=0,cr=4,cu=0,mis=0,r=1,dep=2,og=4,plh=813480514,tim=5178882057
CLOSE #140650295606992:c=0,e=12,dep=2,type=3,tim=5178882104

That was my point about metadata links. But now about modifications.

As I need to see only the statements, I can use TKPROF to get them aggregated, but then the container switch – like (CDB$ROOT(1)) here – is ignored.

Here is a small AWK script I use to add the Container ID to the SQL ID so that it is visible and detailed into TKPROF output:

awk '/^[*]{3}/{con=$3}/^PARSING IN/{sub(/sqlid=./,"&"con" ")}{print > "con_"FILENAME }'

Then I run TKPROF on the resulting file, with ‘sort=(execu)’ so that I have the modifications (insert/delete/update) first. The result starts with something like this:

SQL ID: (PDB1(3)) 1gfaj4z5hn1kf Plan Hash: 1110520934
 
delete from dependency$
where
d_obj#=:1

I know that dependencies are replicated into all containers (because table metadata is replicated into all containers) so I see following tables modified in the PDB: DEPENDENCY$, ACCESS$, DIANA_VERSION$, and of course OBJ$.

But to answer the initial question, there are no modifications done in the CDB$ROOT. Only SELECT statements there, on SOURCE$, SETTINGS$, CODEAUTH$, WARNING_SETTINGS$

So, probably, the updates have been transformed to no-op operations once the session is aware that the source is the same (same signature) and it just reads the compilation status.

Just as a comparison, tracing the same compilation when done on the CDB$ROOT will show inserts/delete/update on ARGUMENT$, PROCEDUREINFO$, SETTINGS$, PROCEDUREPLSQL$, IDL_UB1$, IDL_SB4$, IDL_UB2$, IDL_CHAR$, … all those tables sorting the compiled code.

So basically, when running DDL on metadata links in a PDB, not all the work is done in the CDB, especially not writing again what is already there (because you always upgrade the CDB$ROOT first). However, up to 12.2 we don’t see a big difference in time. This should change in 18c where the set of DDL to be run on the PDB will be pre-processed to avoid unnecessary operations.

 

Cet article 12c Multitenant Internals: compiling system package from PDB est apparu en premier sur Blog dbi services.

firewalld rules for Veritas Infoscale 7.3 with Oracle

Mon, 2017-11-20 06:30

You might wonder, but yes, Veritas is still alive and there are customers that use it and are very happy with it. Recently we upgraded a large cluster from Veritas 5/RHEL5 to Veritas InfoScale 7.3/RHEL7 and I must say that the migration was straight forward and very smooth (when I have time I’ll write another post specific to the migration). At a point in time during this project the requirement to enable the firewall on the Linux hosts came up so we needed to figure out all the ports and then setup the firewall rules for that. This is how we did it…

The first step was to create a new zone because we did not want to modify any of the default zones:

root@:/home/oracle/ [] firewall-cmd --permanent --new-zone=OracleVeritas
root@:/home/oracle/ [] firewall-cmd --reload
success
root@:/home/oracle/ [] firewall-cmd --get-zones
OracleVeritas block dmz drop external home internal public trusted work

The ports required for Veritas InfoScale are documented here. This is the set of ports we defined:

##### SSH
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-service=ssh
##### Veritas ports
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=4145/udp            # vxio
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=4145/tcp            # vxio
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=5634/tcp            # xprtld
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=8199/tcp            # vras
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=8989/tcp            # vxreserver
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14141/tcp           # had
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14144/tcp           # notifier
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14144/udp           # notifier
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14149/tcp           # vcsauthserver
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14149/udp           # vcsauthserver
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14150/tcp           # CmdServer
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14155/tcp           # wac
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14155/udp           # wac
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14156/tcp           # steward
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=14156/udp           # steward
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=443/tcp             # Vxspserv
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=49152-65535/tcp     # vxio
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=49152-65535/udp     # vxio
#### Oracle ports
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=1521/tcp            # listener
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --permanent --add-port=3872/tcp            # cloud control agent

Because we wanted the firewall only on the public network, but not on the interconnect we changed the interfaces for the zone:

root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --change-interface=bond0
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --change-interface=eth0
root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --change-interface=eth2

One additional step to make this active is to add the zone to the interface configuration (this is done automatically if the interfaces are under control of network manager):

root@:/home/oracle/ [] echo "ZONE=OracleVeritas" >> /etc/sysconfig/network-scripts/ifcfg-eth0
root@:/home/oracle/ [] echo "ZONE=OracleVeritas" >> /etc/sysconfig/network-scripts/ifcfg-eth2
root@:/home/oracle/ [] echo "ZONE=OracleVeritas" >> /etc/sysconfig/network-scripts/ifcfg-bond0

Restart the firewall service:

root@:/home/oracle/ [] systemctl restart firewalld

… and it should be active:

root@:/home/postgres/ [] firewall-cmd --get-active-zones
OracleVeritas
  interfaces: eth0 eth2 bond0
public
  interfaces: eth1 eth3

root@:/home/oracle/ [] firewall-cmd --zone=OracleVeritas --list-all
OracleVeritas (active)
  target: default
  icmp-block-inversion: no
  interfaces: bond0 eth0 eth2
  sources: 
  services: 
  ports: 4145/udp 4145/tcp 5634/tcp 8199/tcp 8989/tcp 14141/tcp 14144/tcp 14144/udp 14149/tcp 14149/udp 14150/tcp 14155/tcp 14155/udp 14156/tcp 14156/udp 443/tcp 49152-65535/tcp 49152-65535/udp 1521/tcp 3872/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

Just for completeness: You can also directly check the configuration file for the zone:

root@:/home/oracle/ [] cat /etc/firewalld/zones/OracleVeritas.xml

Hope this helps …

 

Cet article firewalld rules for Veritas Infoscale 7.3 with Oracle est apparu en premier sur Blog dbi services.

Is it an index, a table or what?

Sun, 2017-11-19 10:54

A recent tweet from Kevin Closson outlined that in PostgreSQL it might be confusing if something is an index or table. Why is it like that? Lets have a look and start be re-building the example from Kevin:

For getting into the same situation Kevin described we need something like this:

postgres=# create table base4(custid int, custname varchar(50));
CREATE TABLE
postgres=# create index base4_idx on base4(custid);
CREATE INDEX

Assuming that we forgot that we created such an index and come back later and try to create it again we have exactly the same behavior:

postgres=# create index base4_idx on base4(custid);
ERROR:  relation "base4_idx" already exists
postgres=# drop table base4_idx;
ERROR:  "base4_idx" is not a table
HINT:  Use DROP INDEX to remove an index.
postgres=# 

They keyword here is “relation”. In PostgreSQL a “relation” does not necessarily mean a table. What you need to know is that PostgreSQL stores everything that looks like a table/relation (e.g. has columns) in the pg_class catalog table. When we check our relations there:

postgres=# select relname from pg_class where relname in ('base4','base4_idx');
  relname  
-----------
 base4
 base4_idx
(2 rows)

… we can see that both, the table and the index, are somehow treated as a relation. The difference is here:

postgres=# \! cat a.sql
select a.relname 
     , b.typname
  from pg_class a
     , pg_type b 
 where a.relname in ('base4','base4_idx')
   and a.reltype = b.oid;
postgres=# \i a.sql
 relname | typname 
---------+---------
 base4   | base4
(1 row)

Indexes do not have an entry in pg_type, tables have. What is even more interesting is, that the “base4″ table is a type itself. This means for every table you create a composite type is created as well that describes the structure of the table. You can even link back to pg_class:

postgres=# select typname,typrelid from pg_type where typname = 'base4';
 typname | typrelid 
---------+----------
 base4   |    32901
(1 row)

postgres=# select relname from pg_class where oid = 32901;
 relname 
---------
 base4
(1 row)

When you want to know what type a relation is of the easiest way is to ask like this:

postgres=# select relname,relkind from pg_class where relname in ('base4','base4_idx');
  relname  | relkind 
-----------+---------
 base4     | r
 base4_idx | i
(2 rows)

… where:

  • r = ordinary table
  • i = index
  • S = sequence
  • t = TOAST table
  • m = materialized view
  • c = composite type
  • f = foreign table
  • p = partitioned table

Of course there are also catalog tables for tables and indexes, so you can also double check there. Knowing all this the message is pretty clear:

postgres=# create index base4_idx on base4(custid);
ERROR:  relation "base4_idx" already exists
postgres=# drop relation base4_idx;
ERROR:  syntax error at or near "relation"
LINE 1: drop relation base4_idx;
             ^
postgres=# drop table base4_idx;
ERROR:  "base4_idx" is not a table
HINT:  Use DROP INDEX to remove an index.
postgres=# 

PostgreSQL finally is telling you that “base4_idx” is an index and not a table which is fine. Of course you could think that PostgreSQL should to that on its own but it is also true: When you want to drop something, you should be sure on what you really want to drop.

 

Cet article Is it an index, a table or what? est apparu en premier sur Blog dbi services.

Unstructed vs. structured

Sat, 2017-11-18 01:13

The title of this blog post was: “Tracing DBMS_RCVMAN for reclaimable archivelogs” until I started to write the conclusion…

In a previous post I mentioned that there’s a bug with archivelog deletion policy when you want to mention both the ‘BACKED UP … TIMES TO …’ and ‘APPLIED’ or ‘SHIPPED’ as conditions for archived logs to be reclaimable. I opened a SR, they didn’t even try to reproduce it (and I can guarantee you can reproduce it in 2 minutes on any currently supported version) so I traced it myself to understand the bug and suggest the fix.

I traced the DBMS_RCVMAN with Kernel Recovery Area function SQL Tracing:

SQL> alter session set events 'trace[kra_sql] disk high, memory disable';
SQL> dbms_backup_restore.refreshAgedFiles;
SQL> alter session set events 'trace[kra_sql] off';

I know refreshAgedFiles checks for reclaimable file in FRA since it was an old bug where we had to run it manually on databases in mount.

I compared the traces when changing the order of ‘APPLIED’ and ‘BACKED UP’ and found the following:

< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: setRedoLogDeletionPolicy with policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: setRedoLogDeletionPolicy with policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK
5340c5340
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK
5343c5343
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING setRedoLogDeletionPolicy with policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY with alldest = 1
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING setRedoLogDeletionPolicy with policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK with alldest = 1
5350,5351c5350,5351
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption devtype=DISK
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption backed up conf - devtype=DISK , backups=1
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption devtype=DISK
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption backed up conf - devtype=DISK, backups=1
5363c5363
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpAl with TRUE
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpAl with key = 128 stamp = 958068130
5367c5367
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpFiles with: no_data_found
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpFiles

You see at the top the difference in the way I mentioned the deletion policy. You see at the bottom that the first one (starting with ‘BACKED UP’) didn’t find archivelogs being backed up (no_data_found). But the second one (starting with ‘APPLIED’) mentioned the sequence# 128.

But if you look carefully, you see another difference in the middle: the “devtype=DISK” has an additional space before the comma in the first case.

So I traced a bit further, including SQL_TRACE and I found that the deletion policy is just using some INSTR and SUBSTR parsing on the deletion policy text to find the policy, the backup times, and the device type. For sure, looking for backups with DEVICE_TYPE=’DISK ‘ instead of ‘DISK’ will not find anything and this is the reason for the bug: no archived logs backed up means no archived log reclaimable.

If you look closer at DBMS_RCVMAN you will find that the device type is extracted with SUBSTR(:1, 1, INSTR(:1, ‘ ‘)) when the device type is followed by a space, which is the reason of this additional space. The correct extraction should be SUBSTR(:1, 1, INSTR(:1, ‘ ‘)-1) and this is what I suggested on the SR.

So what?

Writing the conclusion made me change the title. Currently, a lot of people are advocating for unstructured data. Because it is easy (which rhymes with ‘lazy’). Store information as it comes and postpone the parsing to a more structured data type until you need to process it. This seems to be how the RMAN configuration is stored: as the text we entered. And it is parsed later with simple text function as INSTR(), SUBSTR(), and LIKE. But you can see how a little bug, such as reading an additional character, has big consequences. If you look at the archivelog deletion policy syntax, you have 50% chances to run into this bug on a Data Guard configuration. The Recovery Area will fill up and your database will be blocked. The controlfile grows. Or you noticed it before and you run a ‘delete archivelog’ statement without knowing the reason. You waste space, removing some recovery files from local storage, which could have been kept for longer. If the deletion policy was parsed immediately when entered, like SQL DDL or PL/SQL APIs, the issue would have been detected a long time ago. Structure and strong typing is the way to build robust applications.

 

Cet article Unstructed vs. structured est apparu en premier sur Blog dbi services.

CBO, FIRST_ROWS and VIEW misestimate

Thu, 2017-11-16 23:36

There are several bugs with the optimizer in FIRST_ROWS mode. Here is one I encountered during a 10.2.0.4 to 12.2.0.1 migration when a view had an ‘order by’ in its definition.

Here is the test case that reproduces the problem.

A big table:

SQL> create table DEMO1 (n constraint DEMO1_N primary key,x,y) as select 1/rownum,'x','y' from xmltable('1 to 1000000');
Table DEMO1 created.

with a view on it, and that view has an order by:

SQL> create view DEMOV as select * from DEMO1 order by n desc;
View DEMOV created.

and another table to join to:

SQL> create table DEMO2 (x constraint DEMO2_X primary key) as select dummy from dual;
Table DEMO2 created.

My query reads the view in a subquery, adds a call to a PL/SQL function, and joins the result with the other table:


SQL> explain plan for
select /*+ first_rows(10) */ *
from
( select v.*,dbms_random.value from DEMOV v)
where x in (select x from DEMO2)
order by n desc;
 
Explained.

You can see that I run it with FIRST_ROWS(10) because I actually want to fetch the top-10 rows when ordered by N. As N is a number and I have an index on it and there are no nulls (it is the primary key) I expect to read the first 10 entries from the index, call the function for each of them, then nested loop to the other tables.

In the situation I encountered it, this is what was done in 10g but when migrated to 12c the query was very long because it called the PL/SQL function for million of rows. Here is the plan in my example:


SQL> select * from dbms_xplan.display(format=>'+projection');
 
PLAN_TABLE_OUTPUT
-----------------
Plan hash value: 2046425878
 
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | | 7 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 1 | 21 | | 7 (0)| 00:00:01 |
| 2 | VIEW | DEMOV | 902 | 17138 | | 7 (0)| 00:00:01 |
| 3 | SORT ORDER BY | | 968K| 17M| 29M| 6863 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO1 | 968K| 17M| | 1170 (1)| 00:00:01 |
| 5 | VIEW PUSHED PREDICATE | VW_NSO_1 | 1 | 2 | | 0 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | DEMO2_X | 1 | 2 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
6 - access("X"="V"."X")
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - (#keys=0) "V"."N"[NUMBER,22], "V"."X"[CHARACTER,1], "V"."Y"[CHARACTER,1] 2 - "V"."N"[NUMBER,22], "V"."X"[CHARACTER,1], "V"."Y"[CHARACTER,1] 3 - (#keys=1) INTERNAL_FUNCTION("N")[22], "X"[CHARACTER,1], "Y"[CHARACTER,1] 4 - "N"[NUMBER,22], "X"[CHARACTER,1], "Y"[CHARACTER,1]

A full table scan of the big table, with a call to the PL/SQL function for each row and the sort operation on all rows. Then the Top-10 rows are filtered and the nested loop operates on that. But you see the problem here. The cost of the ‘full table scan’ and the ‘order by’ has been evaluated correctly, but the cost after the VIEW operation is minimized.

My interpretation (but it is just a quick guess) is that the the rowset is marked as ‘sorted’ and then the optimizer considers that the cost to get first rows is minimal (as if it were coming from an index). However, this just ignores the initial cost of getting this rowset.

I can force with a hint the plan that I want – index full scan to avoid a sort and get the top-10 rows quickly:

SQL> explain plan for
select /*+ first_rows(10) INDEX_DESC(@"SEL$3" "DEMO1"@"SEL$3" ("DEMO1"."N")) */ *
from
( select v.*,dbms_random.value from DEMOV v)
where x in (select x from DEMO2)
order by n desc;
 
Explained.

This plan is estimated with an higher cost than the previous one and this is why it was not chosen:

SQL> select * from dbms_xplan.display(format=>'+projection');
PLAN_TABLE_OUTPUT
Plan hash value: 2921908728
 
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 9 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 1 | 21 | 9 (0)| 00:00:01 |
| 2 | VIEW | DEMOV | 902 | 17138 | 9 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEMO1 | 968K| 17M| 8779 (1)| 00:00:01 |
| 4 | INDEX FULL SCAN DESCENDING| DEMO1_N | 968K| | 4481 (1)| 00:00:01 |
| 5 | VIEW PUSHED PREDICATE | VW_NSO_1 | 1 | 2 | 0 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | DEMO2_X | 1 | 2 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
6 - access("X"="V"."X")
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - (#keys=0) "V"."N"[NUMBER,22], "V"."X"[CHARACTER,1], "V"."Y"[CHARACTER,1] 2 - "V"."N"[NUMBER,22], "V"."X"[CHARACTER,1], "V"."Y"[CHARACTER,1] 3 - "N"[NUMBER,22], "X"[CHARACTER,1], "Y"[CHARACTER,1] 4 - "DEMO1".ROWID[ROWID,10], "N"[NUMBER,22]

This cost estimation is fine. The cost of getting all rows by index access is higher than with a full table scan, but the optimizer knows that the actual cost is proportional to the number of rows fetched and then it adjusts the cost accordingly. This is fine here because the VIEW has only non-blocking operations. The problem in the first plan without the hint, was because the same arithmetic was done, without realizing that the SORT ORDER BY is a blocking operation and not a permanent sorted structure, and must be completed before being able to return the first row.

In this example, as in the real case I’ve encountered, the difference in cost is very small (7 versus 9 here) which means that the plan can be ok and switch to the bad one (full scan, call function for all rows, sort them) with a small change in statistics. Note that I mentioned that the plan was ok in 10g but that may simply be related to the PGA settings and different estimation for the cost of sorting.

 

Cet article CBO, FIRST_ROWS and VIEW misestimate est apparu en premier sur Blog dbi services.

A response to: What makes a community?

Thu, 2017-11-16 03:39

A recent tweet of mine resulted in Martin Widlake to write a really great blog post about What makes a community. Please read it before you continue to read this. There was another response from Stefan Koehler which is worth mentioning as well.

Both, Martin and Stefan, speak about Oracle communities because this is were they are involved in. At the beginning of Martin’s post he writes: “Daniel was not specific about if this was a work/user group community or a wider consideration of society, …” and this was intentional. I don’t think that it really matters much if we speak about a community around a product, a community that just comes together for drinking beer and to discuss the latest football results or even if we talk about a community as a family. At least in the German translation “Gemeinschaft” applies to a family as well. This can be a very few people (mother,father,kids) or more if we include brothers, sisters, grandmas and so on. But still the same rules that Martin outlines in hist blog post apply: You’ll always have people driving the community such as organizing dinners (when we speak about families), organizing conferences (when we speak about technical communities) or organizing parties (when we talk about fun communities) or organizing whatever for whatever people make up the specific community. Then you’ll always have the people willing to help (the people Martin describes as the people who share and/or talk) and you’ll always have the people that consume/attend which is good as well, because without them you’d have nothing to share and to organize.

We at dbi services are a community as well. As we work with various products the community is not focused on a specific product (well, it is in the area of a specific product, of course) but rather on building an environment we like to work in. The community here is tight to technology but detached from a single product. We share the same methodologies, the same passion and have fun attending great parties that are organized mostly by the non technical people in our company. In this case you could say: The non-technical people are the drivers for the community of the company even if the company is very technical from its nature. And here we have the same situation again: Some organize, some attend/consume and some share, but all are required (as Martin outlined in his post as well).

Of course I have to say something about the PostgreSQL community: Because PostgreSQL is a real community project the community around it is much more important than with other technical communities. I do not say that you do not need a community for vendor controlled products because when the vendor fails to build a community around its product the product will fail as well. What I am saying is that the PostgreSQL community goes deeper as the complete product is driven by the community. Of course there are companies that hire people working for the community but they are not able to influence the direction if there is no agreement about the direction in the community. Sometimes this can make it very hard to progress and a lot of discussions need to be discussed but at the end I believe it is better to have something which the majority agrees on. In the PostgreSQL community I think there are several drivers: For sure all the developers are drivers, the people who take care of all the infrastructure (mailing lists, commitfests, …) are drivers as well. Basically everybody you can see on the mailing lists and answers questions are drivers because they keep the community active. Then we have all the people you see in other communities as well: Those who share and those who consume/attend. I think you get the point: An open source community is by its nature far more active than what you usually see for non-opensource communities for one reason: It already starts with the developers and not with a community around a final product. You can be part of such a community from the very beginning, which is writing new features and patches.

Coming back to the original question: What makes a community? Beside what Martin outlined there are several other key points:

  • The direction of the community (no matter if technical or not) must be so that people want to be part of that
  • When we speak about a community around a product: You must identify yourself with the product. When the product goes into a direction you can not support for whatever reason you’ll leave, sooner or later. The more people leave, the weaker the community
  • It must be easy to participate and to get help
  • A lot of people are willing to spend (free-) time to do stuff for the community
  • There must be a culture which respects you and everybody else
  • Maybe most important: A common goal and people that are able and willing to work together, even if this sometimes requires a lot of discussions

When you have all of these, the drivers, the people who share, and those that attend will come anyway, I believe.

 

Cet article A response to: What makes a community? est apparu en premier sur Blog dbi services.

Can I do it with PostgreSQL? – 18 – Instead of triggers on views

Wed, 2017-11-15 09:18

It has been quite a while since the last post in this series but today comes the next one. Being at a customer this morning this question popped up: Can we have instead of triggers on a view in PostgreSQL as well? I couln’d immediately answer (although I was quite sure you can) so here is the test. I took an example for Oracle from here and re-wrote it in PostgreSQL syntax.

I took the same tables and adjusted the data types:

CREATE TABLE CUSTOMER_DETAILS ( CUSTOMER_ID INT PRIMARY KEY
                              , CUSTOMER_NAME VARCHAR(20)
                              , COUNTRY VARCHAR(20)
                              );
CREATE TABLE PROJECTS_DETAILS ( PROJECT_ID INT PRIMARY KEY
                              , PROJECT_NAME VARCHAR(30)
                              , PROJECT_START_DATE DATE
                              , CUSTOMER_ID INT REFERENCES CUSTOMER_DETAILS(CUSTOMER_ID)
                              );

The same view definition:

CREATE OR REPLACE VIEW customer_projects_view AS
   SELECT cust.customer_id, cust.customer_name, cust.country,
          projectdtls.project_id, projectdtls.project_name, 
          projectdtls.project_start_Date
   FROM customer_details cust, projects_details projectdtls
   WHERE cust.customer_id = projectdtls.customer_id;

Try to insert:

postgres=# INSERT INTO customer_projects_view VALUES (1,'XYZ Enterprise','Japan',101,'Library management',now());
ERROR:  cannot insert into view "customer_projects_view"
DETAIL:  Views that do not select from a single table or view are not automatically updatable.
HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT trigger or an unconditional ON INSERT DO INSTEAD rule.
Time: 2.135 ms

… and the answer is already in the error message. So obviously we should be able to do that. In PostgreSQL you need a trigger function:

CREATE OR REPLACE FUNCTION cust_proj_view_insert_proc() RETURNS trigger AS $$
BEGIN
        
   INSERT INTO customer_details (customer_id,customer_name,country)
          VALUES (NEW.customer_id, NEW.customer_name, NEW.country);

   INSERT INTO projects_details (project_id, project_name, project_start_Date, customer_id)
   VALUES (
     NEW.project_id,
     NEW.project_name,
     NEW.project_start_Date,
     NEW.customer_id);

   RETURN NEW;
     EXCEPTION WHEN unique_violation THEN
       RAISE EXCEPTION 'Duplicate customer or project id';
END;
$$ LANGUAGE plpgsql;

Then we need a trigger calling this function:

create trigger cust_proj_view_insert_trg 
    instead of insert on customer_projects_view for each row EXECUTE procedure cust_proj_view_insert_proc();

Try the insert again:

INSERT INTO customer_projects_view VALUES (1,'XYZ Enterprise','Japan',101,'Library management',now());
INSERT INTO customer_projects_view VALUES (2,'ABC Infotech','India',202,'HR management',now());

… and here we are:

postgres=# select * FROM customer_details;
 customer_id | customer_name  | country 
-------------+----------------+---------
           1 | XYZ Enterprise | Japan
           2 | ABC Infotech   | India

Definitely, you can :)

 

Cet article Can I do it with PostgreSQL? – 18 – Instead of triggers on views est apparu en premier sur Blog dbi services.

Auto pre-warming in EDB Postgres Advanced Server 10

Wed, 2017-11-15 06:28

Some days ago EDB Postgres Advanced Server 10 was released and one feature which might be handy is auto pre-warming. What this does is to save all the buffers (or better a description of the buffers) which are currently loaded in to shared_buffers to disk and then re-read the buffers automatically when the instance is restarted. Lets see how it works.

Before getting the feature to work we need to look at two parameters which control the behavior:

  • pg_prewarm.autoprewarm: Enabled or disabled the feature
  • pg_prewarm.autoprewarm_interval: The interval the current state is written to disk or 0 to only write once when the instance shutsdown

Another requirement is to load the library when the instance starts:

postgres=# alter system set shared_preload_libraries ='$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq,$libdir/pg_prewarm';
ALTER SYSTEM

Once the instance is restarted we can proceed with the configuration:

postgres=# alter system set pg_prewarm.autoprewarm=true;
ALTER SYSTEM
postgres=# alter system set pg_prewarm.autoprewarm_interval='10s';
ALTER SYSTEM

By doing this we told the server to write the current state of the buffers to disk every 10 seconds. You’ll also notice a new background worker process which is responsible for doing the work:

postgres=# \! ps -ef | grep prewarm | egrep -v "ps|grep"
postgres  3682  3675  0 12:05 ?        00:00:00 postgres: bgworker: autoprewarm   

Lets load something into shared_buffers:

postgres=# insert into t1 select a, md5(a::varchar) from generate_series(1,1000) a;
INSERT 0 1000
postgres=# select count(*) from t1;
 count 
-------
  1000
(1 row)
postgres=# explain (analyze,buffers) select count(*) from t1;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21.50..21.51 rows=1 width=8) (actual time=0.492..0.492 rows=1 loops=1)
   Buffers: shared hit=9
   ->  Seq Scan on t1  (cost=0.00..19.00 rows=1000 width=0) (actual time=0.019..0.254 rows=1000 loops=1)
         Buffers: shared hit=9
 Planning time: 0.070 ms
 Execution time: 0.538 ms
(6 rows)

The “shared hit” confirms that we read the buffers from shared_buffers and not from the os/file system cache. Then lets restart and do the same check again:

postgres@centos7:/u02/pgdata/PG4/ [EDB10] pg_ctl -D . restart -m fast
postgres@centos7:/u02/pgdata/PG4/ [EDB10] psql -X postgres
psql.bin (10.1.5)
Type "help" for help.

postgres=# explain (analyze,buffers) select count(*) from t1;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21.50..21.51 rows=1 width=8) (actual time=0.586..0.586 rows=1 loops=1)
   Buffers: shared hit=9
   ->  Seq Scan on t1  (cost=0.00..19.00 rows=1000 width=0) (actual time=0.024..0.295 rows=1000 loops=1)
         Buffers: shared hit=9
 Planning time: 0.451 ms
 Execution time: 0.766 ms
(6 rows)

postgres=# 

… here we go. How is this information stored? When you take a look at $PGDATA you’ll notice a file with the following format:

postgres@centos7:/u02/pgdata/PG4/ [EDB10] cat $PGDATA/autoprewarm.blocks | tail
<>
0,1664,1262,0,0
15471,1663,1259,0,0
15471,1663,1259,0,1
15471,1663,1259,0,2
15471,1663,1249,0,0
15471,1663,1249,0,1
15471,1663,1249,0,2
15471,1663,1249,0,3
15471,1663,1249,0,4

The first field is the OID of the database:

postgres=# select oid,datname from pg_database where oid=15471;
  oid  | datname  
-------+----------
 15471 | postgres
(1 row)

The second one is the tablespace:

postgres=# select oid,spcname from pg_tablespace where oid=1663;
 oid  |  spcname   
------+------------
 1663 | pg_default
(1 row)

The third one is the table:

postgres=# select oid,relname from pg_class where oid = 16402;
  oid  | relname 
-------+---------
 16402 | t1
(1 row)

postgres=# \! grep 16402 $PGDATA/autoprewarm.blocks
15471,1663,16402,0,0
15471,1663,16402,0,1
15471,1663,16402,0,2
15471,1663,16402,0,3
15471,1663,16402,0,4
15471,1663,16402,0,5
15471,1663,16402,0,6
15471,1663,16402,0,7
15471,1663,16402,0,8
15471,1663,16402,1,0
15471,1663,16402,1,2

The fourth one is the fork/file (0 is the datafile, 1 is the free space map) and the last one is the actual block to load. This is also described in “./contrib/pg_prewarm/autoprewarm.c” in the PostgreSQL source code:

/* Metadata for each block we dump. */
typedef struct BlockInfoRecord
{
        Oid                     database;
        Oid                     tablespace;
        Oid                     filenode;
        ForkNumber      forknum;
        BlockNumber blocknum;
} BlockInfoRecord;

For community PostgreSQL there is the contrib module pg_prewarm you can use for that, check here.

 

Cet article Auto pre-warming in EDB Postgres Advanced Server 10 est apparu en premier sur Blog dbi services.

Dynamic Sampling vs. Extended Statistics

Tue, 2017-11-14 10:09

On datawarehouse databases, I frequently recommend increasing the level of dynamic sampling because:

  • Queries have complex predicates with AND, OR, IN(), ranges and correlated values for which the optimizer cannot estimate the cardinality properly
  • Queries are long anyway (compared to OLTP) and can afford more parse time to get an optimized execution plan

However, there’s a drawback with this approach because sometimes the dynamic sampling estimation may give bad estimations, and supersedes the static statistics which were better. Here is an example in 12.2.0.1

I run with the following parameters:

SQL> show parameter adaptive;
NAME TYPE VALUE
--------------------------------- ------- -----
optimizer_adaptive_plans boolean TRUE
optimizer_adaptive_reporting_only boolean FALSE
optimizer_adaptive_statistics boolean FALSE
optimizer_dynamic_sampling integer 4

The Dynamic Sampling level comes from previous version (11g) and the Adaptive Statistics have been disabled because of all the problems seen in 12cR1 with Adaptive Dynamic Sampling bugs.

I have a query with very bad response time for some values, going to nested loops for 50000 rows. The reason is an under-estimate in the following part of the query:

SQL> explain plan for
2 SELECT count(*) FROM "APP_OWNR"."TBL_APPLICATION1_ID" "TBL_APPLICATION1_ID" WHERE upper("TBL_APPLICATION1_ID"."OPRID") =upper ('qwertz');
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2187255533
&nbspM
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 964 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| TBL_APPLICATION1_ID | 82 | 574 | 964 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
2 - filter(UPPER("OPRID")='QWERTZ')
 
Note
-----
- dynamic statistics used: dynamic sampling (level=4)

The estimation is 82 rows but there are actually 50000 rows. We can see dynamic sampling. The misestimate is probably caused by a sample too small.

Ok, a query with an UPPER() function on the column is not a good idea. Let’s try to gather statistics for the expression:

SQL> exec dbms_stats.gather_table_stats('APP_OWNR','TBL_APPLICATION1_ID',method_opt=>'for all columns size auto for columns (upper(OPRID)) size auto');
PL/SQL procedure successfully completed.
 
SQL> explain plan for
2 SELECT count(*) FROM "APP_OWNR"."TBL_APPLICATION1_ID" "TBL_APPLICATION1_ID" WHERE upper("TBL_APPLICATION1_ID"."OPRID") =upper ('qwertz');
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2187255533
 
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 964 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| TBL_APPLICATION1_ID | 82 | 574 | 964 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
2 - filter(UPPER("OPRID")='QWERTZ')
PLAN_TABLE_OUTPUT
 
Note
-----
- dynamic statistics used: dynamic sampling (level=4)

We have the same misestimate. But the problem is not our statistics on expression. This query is still doing dynamic sampling.

Here’s the proof that we have good static statistics:

SQL> alter session set optimizer_dynamic_sampling=2;
Session altered.
 
SQL> explain plan for
2 SELECT count(*) FROM "APP_OWNR"."TBL_APPLICATION1_ID" "TBL_APPLICATION1_ID" WHERE upper("TBL_APPLICATION1_ID"."OPRID") =upper ('qwertz');
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2187255533
 
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 964 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| TBL_APPLICATION1_ID | 48594 | 332K| 964 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
2 - filter(UPPER("OPRID")='QWERTZ')

Dynamic Sampling did not occur at level 2. Now the estimation is ok thanks to the extended statistics. I have a top-frequency histogram where the cardinality of popular value is exact.

The problem is that dynamic sampling is supposed to add more information to the optimizer, but in this case, it replaces the static statistics. In level 4, dynamic sampling is done as soon as there is a complex predicate in the where clause. And the use of the UPPER() function is considered as a complex predicate. However, in this case, because I have extended statistics, it should be considered as a simple column=value predicate.

Here I’ve set dynamic sampling manually, but this is also what happens when SQL Plan Directives trigger the use of Dynamic Sampling and the good histogram is ignored. This reminds me a Ludovico Caldara blog post about SPD.

Here, maybe, the solution would be Adaptive Dynamic Sampling which may increase the level of sampling when needed:

SQL> alter session set optimizer_dynamic_sampling=11;
Session altered.
 
SQL> explain plan for
2 SELECT count(*) FROM "APP_OWNR"."TBL_APPLICATION1_ID" "TBL_APPLICATION1_ID" WHERE upper("TBL_APPLICATION1_ID"."OPRID") =upper ('qwertz');
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2187255533
 
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 964 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| TBL_APPLICATION1_ID | 37831 | 258K| 964 (1)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
2 - filter(UPPER("OPRID")='QWERTZ')
 
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

In this case, Adaptive Dynamic Sampling is a good approximation. But it would be better to have a level of dynamic sampling that does not consider a predicate as a complex one when the extended statistics exactly match the predicate. Before there is enough artificial intelligence to cope with this, the best recommendation is to focus on design. In this case, ensuring that we have only uppercase values is the best way to keep queries and estimations simple.

 

Cet article Dynamic Sampling vs. Extended Statistics est apparu en premier sur Blog dbi services.

Documentum xPlore: Tuning of JVM for high throughput and low CPU usage

Sun, 2017-11-12 02:05

Sometimes you have java processes or even jboss servers using a lot of CPU. In my example I had an xPlore dsearch server using like 98% of the cpu. When using jconsole and jvisualvm I figured out the garbage collector was using 50 to 60% of the cpu time.
This was because the server was indexing and accessing the internal DB very often. Hence a lot of objects were created, the JVM was not correctly sized, thus all objects went to the tenured(old) space resulting in filling up the heap. The garbage collector had to go through the whole heap and perform a lot of FUll GC. I went to a point where I had a Full GC every 5 seconds that lasted 4 seconds. So I had only 1 second every 5 seconds of “real” processing.

So if you have a process stuck in collecting garbage you can use the following parameters:
USER_MEM_ARGS=”-Xms8G -Xmx8G -XX:PermSize=64m -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=95 -XX:+UseCMSInitiatingOccupancyOnly -XX:MaxTenuringThreshold=2 -XX:MaxPermSize=256m -Xss1024k -Xloggc:/pkgs/dmsp/opt/documentum/xPlore/jboss7.1.1/server/DctmServer_PrimaryDsearch/logs/PrimaryJVM.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution”

-Xms8G: Starting limit of the size of JVM
-Xmx8G: Max limit of heap usage
-XX:PermSize=64m: Permanent space size
-XX:MaxPermSize=256m:
-XX:NewSize=2g: Young Gen space size, here 1/4 of total
-XX:MaxNewSize=2: Maximum Young Gen space
-XX:+UseParNewGC: Parallele copying collector for Young generation, parallelizes the collection process with multiple threads, better perf with multiple processor architecture
-XX:+UseConcMarkSweepGC: Use concurrent mark-sweep collection for the old generation
-XX:+CMSParallelRemarkEnabled: Parallelize the remark phase, goes with CMS option, increases response time
-XX:+ParallelRefProcEnabled: Parallelize the process of weak referenced objects (cache)
-XX:+CMSClassUnloadingEnabled: Enables the class unloading capability for CMS
-XX:CMSInitiatingOccupancyFraction=95: Puts the limit after which the full GC will be trigered, higher value means less Full GC but longuer
-XX:+UseCMSInitiatingOccupancyOnly: Prevents the JVM tu use heuristics GC triggering rules, sets the trigger to use only the previous percentage as a threeshold for Full GC trigger.
-XX:MaxTenuringThreshold=2: Maximum value for tenuring threshold. The default value is 15
-Xss1024k: Thread stack size

These arguments will help you see the behaviour of the GC process:
-Xloggc:/path/to/log/JVM.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution

 

After tuning the JVM I went back to a normal behaviour:

jvm-all-green

 

Cet article Documentum xPlore: Tuning of JVM for high throughput and low CPU usage est apparu en premier sur Blog dbi services.

Documentum xPlore: Several ways to start an Index Agent

Sun, 2017-11-12 01:30

In order to start index agents, you have several ways, depending on how you need to start them.

1. Use the documentum job dm_FTIndexAgentBoot. If you setup start_index_agents=T in the server.ini, the job will be called when the docbases are started.

2. Use the web interface:
Login to http://server:9200/IndexAgent/ with docbase credentials and select “Start in normal mode”

3. Login to DA and go to Indexing Management -> Index Agents and Index Servers
Right click on an index agent and select “Start Agent”

4. Use IAPI command to start it:

select index_name from dm_fulltext_index;
DOCBASE_ftindex_00
select object_name from dm_ftindex_agent_config;
server_9200_IndexAgent

apply,c,,FTINDEX_AGENT_ADMIN,NAME,S,DOCBASE_ftindex_00,AGENT_INSTANCE_NAME,S,server_9200_IndexAgent,ACTION,S,start
next,c,q0
dump,c,q0

5. Use Java:
java -cp $DM_HOME/lib/server-impl.jar:$DOCUMENTUM/dfc/dfc.jar com.documentum.server.impl.utils.IndexAgentCtrl -docbase_name <docbasename> -user_name <username> -action start

 

Cet article Documentum xPlore: Several ways to start an Index Agent est apparu en premier sur Blog dbi services.

Documentum – Unable to stop an IDS configured in SSL?

Fri, 2017-11-10 15:11

When working with the IDS, you might face some interesting behaviors as mentioned in the last blog I wrote for example. This one will focus on the SSL part of the IDS on the target side. On this blog, I will start with showing the content of our start/stop scripts and how it is working in non-SSL, then switching to SSL and try again. Therefore for this blog, I quickly installed a test IDS 7.3 using the default non-SSL port (2788).

 

So to start and stop the IDS on the target side, we are using custom scripts/services that do not contain any port information in their names because it might change or just to be able to start several agents at the same time, aso… So an example of start/stop scripts that can be used for the IDS would be:

[ids@target_server_01 ~]$ cat ~/.bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH

export TZ=UTC
export IDS_HOME=/app/ids/target
export JAVA_HOME=$IDS_HOME/product/jre/linux
export PATH=$JAVA_HOME/bin:$PATH
[ids@target_server_01 ~]$
[ids@target_server_01 ~]$ cd $IDS_HOME/admin
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ cat startIDSTargetCustom.sh
#! /bin/sh
. ~/.bash_profile
echo "Starting the Interactive Delivery Services Target..."
NB_PID=`pgrep -f "com.documentum.webcache.transfer.MigAgent" | wc -l`
if [[ $NB_PID != 0 ]]; then
  echo "The Interactive Delivery Services Target is already running."
else
  if [[ -f $IDS_HOME/admin/nohup-IDSTarget.out ]]; then
    mv $IDS_HOME/admin/nohup-IDSTarget.out $IDS_HOME/admin/nohup-IDSTarget.out_`date +%F_%H%M%S`.out
  fi
  nohup $IDS_HOME/admin/dm_start_ids >> $IDS_HOME/admin/nohup-IDSTarget.out 2>&1 &
  echo "The Interactive Delivery Services Target has been started... Sleeping for 30 seconds."
  sleep 30
fi
# End of File
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ cat dm_start_ids
#! /bin/sh
. ~/.bash_profile
$JAVA_HOME/bin/java -Xms6g -Xmx6g -Dfile.encoding=UTF-8 -Djava.security.egd=file:///dev/./urandom -cp "$JAVA_HOME/lib/ext/*" com.documentum.webcache.transfer.MigAgent $IDS_HOME/admin/config/2788/agent.ini &
# End of File
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ cat dm_stop_ids
#!/bin/sh
. ~/.bash_profile
$JAVA_HOME/bin/java -Djava.security.egd=file:///dev/./urandom -cp "$JAVA_HOME/lib/ext/*" com.documentum.webcache.transfer.Shutdown $IDS_HOME/admin/config/2788/agent.ini $1 $2
# End of File
[ids@target_server_01 admin]$

 

So when the IDS is configured in non-SSL, this is the configuration of the agent.ini (the default one) and the behavior when you try to start/stop it:

[ids@target_server_01 admin]$ cat config/2788/agent.ini
[conn]
transfer_directory=/data/IDS
secure_connection=raw
http_port=2788
https_ca_cert=$IDS_HOME/admin/keys/ca-cert.der
https_server_cert=$IDS_HOME/admin/keys/server-cert.der
https_server_key=$IDS_HOME/admin/keys/server-key.der
check_pass=$IDS_HOME/product/tools/dm_check_password
log_file=$IDS_HOME/admin/log/2788.log
target_database_connection=jdbc:oracle:thin:@(description=(address=(host=database_server_01)(protocol=tcp)(port=1521))(connect_data=(sid=IDSSID)))
database_user=IDS_USER
database_user_pass=SCS_ENCR_TEXT/A1G8H1FBH12ZECB2P917GEN31ZCBGGC2N2HRC2CNZY
JDBC_DRIVER=oracle.jdbc.driver.OracleDriver
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ grep -E "^secure_connection|^http.*port" config/2788/agent.ini
secure_connection=raw
http_port=2788
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ./startIDSTargetCustom.sh
Starting the Interactive Delivery Services Target...
The Interactive Delivery Services Target has been started... Sleeping for 30 seconds.
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ./dm_stop_ids
Nov 11 08:38:42.714:T:main: INFO:       Setting socket TCP no delay to true.
Beginning shutdown...
Shutdown completed
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ cat nohup-IDSTarget.out
Nov 11 08:33:19.493:T:main: INFO:       Begin logging on: $IDS_HOME/admin/log/2788.log
Nov 11 08:33:19.493:T:main: INFO:       MigAgent.java Starting...
Nov 11 08:33:19.496:T:main: INFO:       Interactive Delivery Services - Version 7.3.0010.0003
Nov 11 08:33:19.496:T:main: INFO:       Total process heap space (bytes) : 5368709120
Nov 11 08:33:19.886:T:main: INFO:       HTTP Port:      2788
Nov 11 08:38:42.700:T:Thread-0: INFO:   Setting socket TCP no delay to true.
Nov 11 08:38:42.710:T:Thread-1: INFO:   --------------------------
Nov 11 08:38:42.717:T:Thread-1: INFO:   Checking for valid SHUTDOWN request
Nov 11 08:38:42.718:T:Thread-1: INFO:   Valid SHUTDOWN Request
Nov 11 08:38:42.718:T:Thread-1: INFO:   Shutdown command received, beginning shutdown...
Nov 11 08:38:42.718:T:Thread-1: INFO:   Shutdown complete.
[ids@target_server_01 admin]$

 

So this is working as expected for both start and stop commands. I didn’t execute an End-to-End test or an export but this is also working properly. Then switching the configuration to SSL on the IDS Target can be done pretty easily. I will let you check the documentation on how to regenerate the SSL Certificate if you want to (it is recommended) but that’s basically done using the script $IDS_HOME/product/bin/GenCerts. So let’s switch our IDS in SSL and then try again to stop/start it:

[ids@target_server_01 admin]$ grep -E "^secure_connection|^http.*port" config/2788/agent.ini
secure_connection=raw
http_port=2788
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ sed -i 's,^secure_connection=.*,secure_connection=ssl,' config/2788/agent.ini
[ids@target_server_01 admin]$ sed -i 's,^http.*port=.*,https_port=2788,' config/2788/agent.ini
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ grep -E "^secure_connection|^http.*port" config/2788/agent.ini
secure_connection=ssl
https_port=2788
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ./startIDSTargetCustom.sh
Starting the Interactive Delivery Services Target...
The Interactive Delivery Services Target has been started... Sleeping for 30 seconds.
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ./dm_stop_ids
Connecting to secure WebCache at localhost:2788...
Nov 11 08:45:23.587:T:main: INFO:       Server Certificate : $IDS_HOME/admin/keys/server-cert.der
Nov 11 08:45:23.587:T:main: INFO:       CA Certificate : $IDS_HOME/admin/keys/ca-cert.der
Nov 11 08:45:23.587:T:main: INFO:       Server Key : $IDS_HOME/admin/keys/server-key.der
Connected!
Certificates are valid.
Nov 11 08:45:23.747:T:main: INFO:       Setting socket TCP no delay to true.
com.rsa.ssl.SSLException: An IOException occured while collecting the handshake digests: / by zero
        at com.rsa.ssl.tls1.TLSV1ClientProtocol.stateMachine(TLSV1ClientProtocol.java:283)
        at com.rsa.ssl.tls1.TLSV1ClientProtocol.init(TLSV1ClientProtocol.java:163)
        at com.rsa.ssl.tls1.TLSV1ClientProtocol.<init>(TLSV1ClientProtocol.java:127)
        at com.rsa.ssl.common.TLSV1Loader.startTLS1ClientProtocol(TLSV1Loader.java:336)
        at com.rsa.ssl.common.ClientProtocol.sendHello(ClientProtocol.java:243)
        at com.rsa.ssl.common.ClientProtocol.startHandshake(ClientProtocol.java:379)
        at com.rsa.ssl.SSLSocket.getOutputStream(SSLSocket.java:229)
        at com.documentum.webcache.transfer.Client.<init>(Unknown Source)
        at com.documentum.webcache.transfer.Shutdown.<init>(Unknown Source)
        at com.documentum.webcache.transfer.Shutdown.main(Unknown Source)
<B> <FONT color="red">
Nov 11 08:45:23.960:T:main: ERROR:      Client(): creating data streamscom.rsa.ssl.SSLException: An IOException occured while collecting the handshake digests: / by zero
</FONT> </B>
Error creating shutdown object.Error creating data streams
An IOException occured while collecting the handshake digests: / by zero
Error creating data streams
An IOException occured while collecting the handshake digests: / by zero
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ cat nohup-IDSTarget.out
Nov 11 08:40:37.074:T:main: INFO:       Begin logging on: $IDS_HOME/admin/log/2788.log
Nov 11 08:40:37.075:T:main: INFO:       MigAgent.java Starting...
Nov 11 08:40:37.077:T:main: INFO:       Interactive Delivery Services - Version 7.3.0010.0003
Nov 11 08:40:37.077:T:main: INFO:       Total process heap space (bytes) : 5368709120
Nov 11 08:40:37.426:T:main: INFO:       HTTPS Port:     2788
Nov 11 08:40:37.426:T:main: INFO:       Server Certificate : $IDS_HOME/admin/keys/server-cert.der
Nov 11 08:40:37.426:T:main: INFO:       CA Certificate : $IDS_HOME/admin/keys/ca-cert.der
Nov 11 08:40:37.426:T:main: INFO:       Server Key : $IDS_HOME/admin/keys/server-key.der
Nov 11 08:45:23.744:T:Thread-0: INFO:   Setting socket TCP no delay to true.
<B> <FONT color="red">
Nov 11 08:45:23.976:T:Thread-0: ERROR:  Exception: An IOException occured while reading the finished message: read() error
</FONT> </B>
<B> <FONT color="red">
Nov 11 08:45:23.977:T:Thread-0: ERROR:  Exception: Error Spawning new requestor
</FONT> </B>
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ps -ef | grep MigAgent | grep -v grep
ids   15398     1  0 08:40 pts/2    00:00:04 $JAVA_HOME/bin/java -Xms6g -Xmx6g -Dfile.encoding=UTF-8 -Djava.security.egd=file:///dev/./urandom -cp $JAVA_HOME/lib/ext/* com.documentum.webcache.transfer.MigAgent $IDS_HOME/admin/config/2788/agent.ini
[ids@target_server_01 admin]$

 

As you can see above, the stop command isn’t working at all. It’s not doing anything since the process is still up&running. When you try to stop the IDS it will fail, apparently because of a division by zero. You can try to check the different configuration files, you can check that the IDS is working properly from End-to-End, you can do a lot of things (like I did) but you will (likely) not find any solution. This is actually a known issue on OpenText side and it is documented as part of SCS-3683. So how can you stop the IDS Target process then? Well, the only way is to kill it… So an updated stop script that would work for both non-SSL and SSL IDS Agents would be something like that:

[ids@target_server_01 admin]$ cat dm_stop_ids
#!/bin/sh
. ~/.bash_profile
AGENT_PORT="2788"
CONN_MODE=`grep "^secure_connection" $IDS_HOME/admin/config/${AGENT_PORT}/agent.ini | sed 's,^secure_connection[[:space:]]*=[[:space:]]*,,'`
if [[ "$CONN_MODE" == "ssl" ]]; then
  IDS_PID=`pgrep -f "com.documentum.webcache.transfer.MigAgent.*${AGENT_PORT}"`
  if [[ $IDS_PID != '' ]]; then
    kill $IDS_PID
    sleep 5
    IDS_PID=`pgrep -f "com.documentum.webcache.transfer.MigAgent.*${AGENT_PORT}"`
    if [[ $IDS_PID != '' ]]; then
      kill -9 $IDS_PID
    fi
    echo "The Interactive Delivery Services Target has been stopped..."
  else
    echo "The Interactive Delivery Services Target is already stopped."
  fi
else
  $JAVA_HOME/bin/java -Djava.security.egd=file:///dev/./urandom -cp "$JAVA_HOME/lib/ext/*" com.documentum.webcache.transfer.Shutdown $IDS_HOME/admin/config/${AGENT_PORT}/agent.ini $1 $2
fi
# End of File
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ./dm_stop_ids
The Interactive Delivery Services Target has been stopped...
[ids@target_server_01 admin]$
[ids@target_server_01 admin]$ ps -ef | grep MigAgent | grep -v grep
[ids@target_server_01 admin]$

 

It annoys me to kill a process to stop it but since there is, according to OTX, no other solution…

 

 

Cet article Documentum – Unable to stop an IDS configured in SSL? est apparu en premier sur Blog dbi services.

Documentum – Unable to configure IDS 7.3 for a docbase

Fri, 2017-11-10 13:30

In this blog, I will talk about an issue with the IDS 7.3 which is installed on a Content Server 7.3. The IDS is the Interactive Delivery Services. It is a product provided by OpenText that needs to be installed on a Content Server (for the “Source” part) and on a Target Server (for the “Target” part). The IDS can be used to publish content from a Documentum repository to a target machine for use in another application or something like that.

 

When installing/configuring an IDS, there are several things to do:

  1. Install the Source + patch if needed
  2. Configure a docbase on the Source (basically install DARs)
  3. Install the Target + patch if needed
  4. Configure a docbase on the Target (basically setup of an agent which will use a DB + file system for the exported documents)

 

In this blog, I will only talk about an issue which will occur if you try to execute the step 2 with an IDS 7.3 on a Content Server 7.3. Once the IDS is installed and patched (if needed), you can configure a docbase using the config.bin file:

[dmadmin@content_server_01 ~]$ cd $DM_HOME/webcache/install/
[dmadmin@content_server_01 install]$ 
[dmadmin@content_server_01 install]$ ./config.bin

 

On the IDS Source Configurator, you just have to select the docbase you want to configure and it will start the configuration of the docbase. As mentioned above, the main thing it is doing is to install the DARs that are placed under “$DM_HOME/webcache/install/SCSDar/”. For that purpose, it will use the Headless Composer. When you do that, an error message will be printed saying the following:

DiWAWebcsConfigureDocbase failed! - Could not deploy $DM_HOME/webcache/install/SCSDar/SCSDocApp.dar.
 Please check dar installation log file $DM_HOME/webcache/install/SCSDar/DocBase1_SCSDocApp_dar.log for the installation exceptions.
Errors occured while invoking Headless Composer.; Runtime execution failed with child process "$DOCUMENTUM_SHARED/java/1.7.0_72/jre/bin/java" exit code of 13; For more detailed information, see the error log: $DM_HOME/webcache/install/setupError.log

 

So what’s the issue? To understand how twisted/evil the IDS is, let’s first talk about the CS 7.3. When you install a Content Server 7.3, it will install the binaries, the JMS, the Headless Composer and a single Java which is: $DOCUMENTUM_SHARED/java64/JAVA_LINK. Well this is actually a symbolic link to the real Java version installed by a basic CS 7.3: $DOCUMENTUM_SHARED/java64/1.8.0_77. So for a Content Server 7.3, it’s simple, everything is using Java 8u77.

[dmadmin@content_server_01 install]$ echo $JAVA_HOME
$DOCUMENTUM_SHARED/java64/JAVA_LINK
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ ls -l $JAVA_HOME
lrwxrwxrwx. 1 dmadmin dmadmin 39 Oct  5 08:07 $DOCUMENTUM_SHARED/java64/JAVA_LINK -> $DOCUMENTUM_SHARED/java64/1.8.0_77
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ $JAVA_HOME/bin/java -version
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
[dmadmin@content_server_01 install]$

 

Now why am I saying that the IDS is twisted? Well as you can see in the error message above, the path to the Java mentioned is “$DOCUMENTUM_SHARED/java/1.7.0_72″. Where is this coming from? One might think at first look that this is the java from a CS 7.2… And indeed it is the same version since a CS 7.2 was using Java 7u72 but it’s not even the same path… A CS 7.2 was using “$DOCUMENTUM_SHARED/java64/1.7.0_72″ (notice the “java64″ for 64 bits OS).

[dmadmin@content_server_01 install]$ $DOCUMENTUM_SHARED/java/1.7.0_72/bin/java -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

 

But it’s not all! If you take a look at the start script of the IDS 7.3 (below $IDS_HOME = $DOCUMENTUM_SHARED/wildfly9.0.1) you will see another Java!

[dmadmin@content_server_01 install]$ grep JAVA_HOME $IDS_HOME/server/startWEBCACHE.sh
JAVA_HOME="$DM_HOME/webcache/jre/linux"
export JAVA_HOME
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ $DM_HOME/webcache/jre/linux/bin/java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
[dmadmin@content_server_01 install]$

 

So when you install an IDS 7.3 on a CS 7.3 (that is already using Java 8u77), it will run using a Java 8u91 and it will also install a Java 7u72… Why would you make it easy when you can make it complicated! Since the IDS will only run using the Java 8u91, what’s the purpose of the Java 7u72 then? Well the only purpose I could find is precisely linked to this issue: to install DARs.

 

By default the Headless Composer – since installed with the CS 7.3 – will use the Java 8u77 ($DOCUMENTUM_SHARED/java64/JAVA_LINK) but the IDS isn’t able to install the DARs with Java 8. I don’t know where the Java version to be used by the IDS during the installation is defined but it might very well be hardcoded because the Installer is using the file /tmp/dctm_dmadmin/install_xxxxx/istempxxxxxyyyyyyyy/bundledLinux.jvm to know which Java to use and it’s always the Java 7u72 (the bundled one). You can try to update the file “$DM_HOME/webcache/install/install_info.ini” with a different “BUNDLED_JAVA_HOME” but it won’t change anything.

 

So then how can you control which Java should be used by the Headless Composer? That’s done in the java.ini file!

[dmadmin@content_server_01 install]$ grep -E "^java_library_path|^java_classpath" $DM_HOME/install/composer/ComposerHeadless/plugins/com.emc.ide.external.dfc_1.0.0/dmbasic/linux/java.ini
java_library_path = $DOCUMENTUM_SHARED/java64/JAVA_LINK/jre/lib/amd64/libjava.so
java_classpath = $DM_HOME/dctm-server.jar:$DOCUMENTUM_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config:$DOCUMENTUM_SHARED/java64/JAVA_LINK/jre/lib
[dmadmin@content_server_01 install]$

 

As you can see above (and I already said that above), the Headless Composer is using the Java that comes with the CS 7.3, so it is by default Java 8u77 ($DOCUMENTUM_SHARED/java64/JAVA_LINK). If you try to change the Java used in the java.ini file from Java 8u77 (CS 7.3) to Java 8u91 (IDS 7.3), it will still not work. What you need to do is changing the Java used to the value that the IDS expect and you can do it like that:

[dmadmin@content_server_01 install]$ grep -E "^java_library_path|^java_classpath" $DM_HOME/install/composer/ComposerHeadless/plugins/com.emc.ide.external.dfc_1.0.0/dmbasic/linux/java.ini
java_library_path = $DOCUMENTUM_SHARED/java64/JAVA_LINK/jre/lib/amd64/libjava.so
java_classpath = $DM_HOME/dctm-server.jar:$DOCUMENTUM_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config:$DOCUMENTUM_SHARED/java64/JAVA_LINK/jre/lib
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ cp $DM_HOME/install/composer/ComposerHeadless/plugins/com.emc.ide.external.dfc_1.0.0/dmbasic/linux/java.ini $DM_HOME/install/composer/ComposerHeadless/plugins/com.emc.ide.external.dfc_1.0.0/dmbasic/linux/java.ini_orig_before_IDS
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ export IDS_JAVA="$DOCUMENTUM_SHARED/java/1.7.0_72"
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ sed -i "s,$JAVA_HOME,$IDS_JAVA," $DM_HOME/install/composer/ComposerHeadless/plugins/com.emc.ide.external.dfc_1.0.0/dmbasic/linux/java.ini
[dmadmin@content_server_01 install]$
[dmadmin@content_server_01 install]$ grep -E "^java_library_path|^java_classpath" $DM_HOME/install/composer/ComposerHeadless/plugins/com.emc.ide.external.dfc_1.0.0/dmbasic/linux/java.ini
java_library_path = $DOCUMENTUM_SHARED/java/1.7.0_72/jre/lib/amd64/libjava.so
java_classpath = $DM_HOME/dctm-server.jar:$DOCUMENTUM_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config:$DOCUMENTUM_SHARED/java/1.7.0_72/jre/lib
[dmadmin@content_server_01 install]$

 

After doing that, you can try to execute the IDS Source Configurator again and this time, it will be able to install the IDS DARs into the target docbase. Don’t forget to restore the java.ini to its initial value afterwards…

 

 

Cet article Documentum – Unable to configure IDS 7.3 for a docbase est apparu en premier sur Blog dbi services.

Documentum – ActiveX error 12019 in D2-Config during export

Fri, 2017-11-10 11:50

Some months ago at a customer, we started the rollout of some security baselines on a new application (not yet productive). One of the changes was to enforce the use of TLSv1.2 on all our Documentum Clients like D2/D2-Config (4.5, 4.6, 4.7), DA (7.2, 7.3), aso… TLSv1.2 was already enabled before that but there was also a fallback to TLSv1.1 or 1.0. For Security reasons, at some point you will need to ensure that TLSv1.2 only can be used because the previous versions contain vulnerabilities… Obviously, there were some testing and validation to ensure that Documentum DFC Clients could handle the TLSv1.2 and everything was working properly.

 

A few days later, we started to receive a few tickets from developers that couldn’t export the configuration from D2-Config with an ActiveX error 12019:

ActiveX_12019

 

The strange thing is that this looked like a random issue because on some workstations, it was working properly and on a few others it wasn’t. We took for example two Windows 7 workstations with the same OS and patch level, with the same ActiveX, with the same Browser (IE11 with TLSv1.0, 1.1 and 1.2 enabled) and the issue could only be seen on one of the two workstations.

After more tests, it appeared that the issue could only be reproduced when the D2-Config was using TLSv1.2 and that the client workstation was a Windows 7 or Windows 8 (but not for all workstations…). For W8.1 or W10, it was always working.

 

So looking into this, we found on the Windows support site a registry that could help. As described in this webpage, when an application specifies the “WINHTTP_OPTION_SECURE_PROTOCOLS”, the OS will check for a value in the registry and use it if present. If the registry entry isn’t present, it will then use the OS default value which is specified there: for Windows 7 and 8, only SSLv3 and TLSv1.0 are enabled and for Windows 8.1 and 10, SSLv3, TLSv1.0, TLSv1.1 and TLSv1.2 are enabled.

 

The registry to edit can be found here:

  • 32 bits OS: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\WinHttp
  • 64 bits OS: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Internet Settings\WinHttp

 

At this location, you will probably not have much keys so the one to add to control this behavior is “DefaultSecureProtocols” (DWORD). As mentioned on the Microsoft website, the possible values are as follow:

DefaultSecureProtocols Value Protocol enabled  0x00000008  Enable SSL 2.0 by default  0x00000020  Enable SSL 3.0 by default  0x00000080  Enable TLS 1.0 by default  0x00000200  Enable TLS 1.1 by default  0x00000800  Enable TLS 1.2 by default

With these values in mind, you can do some simple additions (in hexadecimal of course) to enable several protocols. So for example:

  • Enabling TLSv1.1 + TLSv1.2 => 0x00000200 + 0x00000800 = 0x00000A00
  • Enabling TLSv1.0 + TLSv1.1 + TLSv1.2 => 0x00000080 + 0x00000200 + 0x00000800 = 0x00000A80

 

In our case, IE11 supports from TLSv1.0 onwards and therefore we enabled the same for the “DefaultSecureProtocols” registry, to avoid possible issues with other/older applications. Since we restrict the use of TLSv1.2 on our Documentum DFC Clients, the end-user workstations will not have the choice to use weaker protocols so we are on the safe side for our applications. After doing that, all Windows 7 workstations (without exception) were now able to export the D2-Config configuration without issue.

 

Note: All W7 workstations were all part of the same domain with the same setup, the same GPO and without admin rights for the end-users so this registry wasn’t setup at all… Yet the issue was only on some Windows 7 workstations and it is still unclear why… There must be a difference somewhere but we still didn’t find it. That’s why this registry will only be packaged as a fix for the workstations were the issue is present and until everybody is moved to Windows 8.1 or later.

 

Cet article Documentum – ActiveX error 12019 in D2-Config during export est apparu en premier sur Blog dbi services.

Pages