Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 17 hours 14 min ago

DctmAPI.py revisited

Thu, 2020-12-31 16:59

2 years ago, I proposed a ctypes-based Documentum extension for python, DctmAPI.py. While it did the job, it was quite basic. For example, its select2dict() function, as inferred from its name, returned the documents from a dql query into a list of dictionaries, one per document, all in memory. While this is OK for testing and demonstration purpose, it can potentially put some stress on the available memory; besides, do we really need to hold at once in memory a complete result set with several hundreds thousands rows ? It makes more sense to iterate and process the result row by row. For instance, databases have cursors for that purpose.
Another rudimentary demonstration function was select(). Like select2dict(), it executed a dql statement but output the result row by row to stdout without any special attempt at pretty printing it. The result was quite crude, yet OK for testing purposes.
So, after 2 years, I thought it was about time to revamp this interface and make it more practical. A new generator-based function, co_select(), has been introduced for a more efficient processing of the result set. select2dict() is still available for those cases where it is still handy to have a full result set in memory and the volume is manageable; actually, select2dict() is now down to 2 lines, the second one being a list comprehension around co_select() (see the listing below). select() has become select_to_stdout() and its output much enhanced; it can be json or tabular, with optional column-wrapping à la sql*plus and colorization as well, all stuff I mentioned several times in the past, e.g. here. Moreover, a pagination functionality has been added through the functions paginate() and paginate_to_stdout(). Finally, exceptions and message logging have been used liberally. As it can be seen, those are quite some improvements from the original version. Of course, there are so many way to implement them depending on the level of usability and performance that is looked for. Also, new functionalities, maybe unexpected ones as of this writing, can be felt necessary, so the current functions are only to be taken as examples.
Let’s see how the upgraded module looks like now.

Listing of DctmAPI.py
"""
This module is a python - Documentum binding based on ctypes;
requires libdmcl40.so/libdmcl.so to be reachable through LD_LIBRARY_PATH;
initial version, C. Cervini - dbi-services.com - May 2018
revised, C. Cervini - dbi-services.com - December 2020

The binding works as-is for both python2 amd python3; no recompilation required; that's the good thing with ctypes compared to e.g. distutils/SWIG;
Under a 32-bit O/S, it must use the libdmcl40.so, whereas under a 64-bit Linux it must use the java backed one, libdmcl.so;

For compatibility with python3 (where strings are now unicode ones and no longer arrays of bytes, ctypes strings parameters are always converted to unicode, either by prefixing them
with a b if litteral or by invoking their encode('ascii', 'ignore') method; to get back to text from bytes, b.decode() is used;these works in python2 as well as in python3 so the source is compatible with these two versions of the language;

Because of the use of f-strings formatting, python 3.5 minimum is required;
"""

import os
import ctypes
import sys, traceback
import json

# use foreign C library;
# use this library in eContent server < v6.x, 32-bit Linux;
dmlib = '/home/dmadmin/documentum53/libdmcl40.so'
dmlib = 'libdmcl40.so'

# use this library in eContent server >= v6.x, 64-bit Linux;
dmlib = 'libdmcl.so'

# used by ctypes;
dm = 0

# maximum cache size in rows;
# used while calling the paginate() function;
# set this according to the row size and the available memory;
# set it to 0 for unlimited memory;
MAX_CACHE_SIZE = 10000

# incremental log verbosity levels, i.e. include previous levels;
class LOG_LEVEL:
   # no logging;
   nolog = 0

   # informative messages;
   info = 1

   # errors, i.e. exceptions messages and less;
   error = 2

   # debug, i.e. functions calls and less;
   debug = 3

   # current active level;
   log_level = error
   
class dmException(Exception):
   """
   generic, catch-all documentum exception;
   """
   def __init__(self, origin = "", message = None):
      super().__init__(message)
      self.origin = origin
      self.message = message

   def __repr__(self):
      return f"exception in {self.origin}: {self.message if self.message else ''}"

def show(level = LOG_LEVEL.error, mesg = "", beg_sep = "", end_sep = ""):
   """
   displays the message msg if allowed
   """
   if LOG_LEVEL.log_level > LOG_LEVEL.nolog and level <= LOG_LEVEL.log_level:
      print(f"{beg_sep} {repr(mesg)} {end_sep}")

def dmInit():
   """
   initializes the Documentum part;
   returns True if successfull, False otherwise;
   since they already have an implicit namespace through their dm prefix, dm.dmAPI* would be redundant so we define later dmAPI*() as wrappers around their respective dm.dmAPI*() functions;
   returns True if no error, False otherwise;
   """
   show(LOG_LEVEL.debug, "in dmInit()")
   global dm
   try:
      dm = ctypes.cdll.LoadLibrary(dmlib);  dm.restype = ctypes.c_char_p
      show(LOG_LEVEL.debug, f"in dmInit(), dm= {str(dm)} after loading library {dmlib}")
      dm.dmAPIInit.restype    = ctypes.c_int;
      dm.dmAPIDeInit.restype  = ctypes.c_int;
      dm.dmAPIGet.restype     = ctypes.c_char_p;      dm.dmAPIGet.argtypes  = [ctypes.c_char_p]
      dm.dmAPISet.restype     = ctypes.c_int;         dm.dmAPISet.argtypes  = [ctypes.c_char_p, ctypes.c_char_p]
      dm.dmAPIExec.restype    = ctypes.c_int;         dm.dmAPIExec.argtypes = [ctypes.c_char_p]
      status  = dm.dmAPIInit()
   except Exception as e:
      show(LOG_LEVEL.error, "exception in dmInit():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   else:
      status = True
   finally:
      show(LOG_LEVEL.debug, "exiting dmInit()")
      return status
   
def dmAPIDeInit():
   """
   releases the memory structures in documentum's library;
   returns True if no error, False otherwise;
   """
   show(LOG_LEVEL.debug, "in dmAPIDeInit()")
   try:
      dm.dmAPIDeInit()
   except Exception as e:
      show(LOG_LEVEL.error, "exception in dmAPIDeInit():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   else:
      status = True
   finally:
      show(LOG_LEVEL.debug, "exiting dmAPIDeInit()")
      return status
   
def dmAPIGet(s):
   """
   passes the string s to dmAPIGet() method;
   returns a non-empty string if OK, None otherwise;
   """
   show(LOG_LEVEL.debug, "in dmAPIGet()")
   try:
      value = dm.dmAPIGet(s.encode('ascii', 'ignore'))
   except Exception as e:
      show(LOG_LEVEL.error, "exception in dmAPIGet():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   else:
      status = True
   finally:
      show(LOG_LEVEL.debug, "exiting dmAPIGet()")
      return value.decode() if status and value is not None else None

def dmAPISet(s, value):
   """
   passes the string s to dmAPISet() method;
   returns TRUE if OK, False otherwise;
   """
   show(LOG_LEVEL.debug, "in dmAPISet()")
   try:
      status = dm.dmAPISet(s.encode('ascii', 'ignore'), value.encode('ascii', 'ignore'))
   except Exception as e:
      show(LOG_LEVEL.error, "exception in dmAPISet():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   else:
      status = True
   finally:
      show(LOG_LEVEL.debug, "exiting dmAPISet()")
      return status

def dmAPIExec(stmt):
   """
   passes the string s to dmAPIExec() method;
   returns TRUE if OK, False otherwise;
   """
   show(LOG_LEVEL.debug, "in dmAPIExec()")
   try:
      status = dm.dmAPIExec(stmt.encode('ascii', 'ignore'))
   except Exception as e:
      show(LOG_LEVEL.error, "exception in dmAPIExec():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   else:
      # no error, status is passed through, to be converted to boolean below;
      pass
   finally:
      show(LOG_LEVEL.debug, "exiting dmAPIExec()")
      return True == status 

def connect(docbase, user_name, password):
   """
   connects to given docbase as user_name/password;
   returns a session id if OK, None otherwise
   """
   show(LOG_LEVEL.debug, "in connect(), docbase = " + docbase + ", user_name = " + user_name + ", password = " + password) 
   try:
      session = dmAPIGet(f"connect,{docbase},{user_name},{password}")
      if session is None:
         raise dmException(origin = "connect()", message = f"unsuccessful connection to docbase {docbase} as user {user_name}")
   except dmException as dme:
      show(LOG_LEVEL.error, dme)
      show(LOG_LEVEL.error, dmAPIGet(f"getmessage,{session}").rstrip())
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      session = None
   else:
      show(LOG_LEVEL.debug, f"successful session {session}")
      # emptying the message stack in case some are left form previous calls;
      while True:
         msg = dmAPIGet(f"getmessage,{session}").rstrip()
         if msg is None or not msg:
            break
         show(LOG_LEVEL.debug, msg)
   finally:
      show(LOG_LEVEL.debug, "exiting connect()")
      return session

def execute(session, dql_stmt):
   """
   execute non-SELECT DQL statements;
   returns TRUE if OK, False otherwise;
   """
   show(LOG_LEVEL.debug, f"in execute(), dql_stmt={dql_stmt}")
   try:
      query_id = dmAPIGet(f"query,{session},{dql_stmt}")
      if query_id is None:
         raise dmException(origin = "execute()", message = f"query {dql_stmt}")
      err_flag = dmAPIExec(f"close,{session},{query_id}")
      if not err_flag:
         raise dmException(origin = "execute()", message = "close")
   except dmException as dme:
      show(LOG_LEVEL.error, dme)
      show(LOG_LEVEL.error, dmAPIGet(f"getmessage,{session}").rstrip())
      status = False
   except Exception as e:
      show(LOG_LEVEL.error, "exception in execute():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   else:
      status = True
   finally:
      show(LOG_LEVEL.debug, "exiting execute()")
      return status

def co_select(session, dql_stmt, numbering = False):
   """
   a coroutine version of former select2dict;
   the result set is returned of row at a time as a dictionary by a yield statement, e.g.:
   {"attr-1": "value-1", "attr-2": "value-2", ... "attr-n": "value-n"}
   in case of repeating attributes, value is an array of values, e.g.:
   { .... "attr-i": ["value-1", "value-2".... "value-n"], ....}
   """
   show(LOG_LEVEL.debug, "in co_select(), dql_stmt=" + dql_stmt)
   try:
      query_id = dmAPIGet(f"query,{session},{dql_stmt}")
      if query_id is None:
         show(LOG_LEVEL.error, f'in co_select(), error in dmAPIGet("query,{session},{dql_stmt}")')
         raise dmException(origin = "co_select", message = f"query {dql_stmt}")

      # counts the number of returned rows in the result set;
      row_counter = 0

      # list of attributes returned by query;
      # internal use only; the caller can compute it at will through the following expression: results[0].keys();
      attr_names = []

      # default number of rows to return at once;
      # can be dynamically changed by the caller through send();
      size = 1

      # multiple rows are returned as an array of dictionaries;
      results = []

      # iterate through the result set;
      while dmAPIExec(f"next,{session},{query_id}"):
         result = {"counter" : f"{row_counter + 1}"} if numbering else {}
         nb_attrs = dmAPIGet(f"count,{session},{query_id}")
         if nb_attrs is None:
            raise dmException(origin = "co_select", message = "count")
         nb_attrs = int(nb_attrs) 
         for i in range(nb_attrs):
            if 0 == row_counter:
               # get the attributes' names only once for the whole query;
               value = dmAPIGet(f"get,{session},{query_id},_names[{str(i)}]")
               if value is None:
                  raise dmException(origin = "co_select", message = f"get ,_names[{str(i)}]")
               attr_names.append(value)

            is_repeating = dmAPIGet(f"repeating,{session},{query_id},{attr_names[i]}")
            if is_repeating is None:
               raise dmException(origin = "co_select", message = f"repeating {attr_names[i]}")
            is_repeating = 1 == int(is_repeating)

            if is_repeating:
               # multi-valued attributes;
               result[attr_names[i]] = []
               count = dmAPIGet(f"values,{session},{query_id},{attr_names[i]}")
               if count is None:
                  raise dmException(origin = "co_select", message = f"values {attr_names[i]}")
               count = int(count)

               for j in range(count):
                  value = dmAPIGet(f"get,{session},{query_id},{attr_names[i]}[{j}]")
                  if value is None:
                     value = "null"
                  result[attr_names[i]].append(value)
            else:
               # mono-valued attributes;
               value = dmAPIGet(f"get,{session},{query_id},{attr_names[i]}")
               if value is None:
                  value = "null"
               result[attr_names[i]] = value

         row_counter += 1
         results.append(result)

         size -= 1
         if size > 0:
            # a grouping has been requested;
            continue
         
         while True:
            # keeps returning the same results until the group size is non-negative;
            # default size value if omitted is 1, so next(r) keeps working;
            # if the size is 0, abort the result set;
            size = yield results
            if size is None:
               # default value is 1;
               size = 1
               break
            if size >= 0:
               # OK if size is positive or 0;
               break
         results = []
         if 0 == size: break

      err_flag = dmAPIExec(f"close,{session},{query_id}")
      if not err_flag:
         raise dmException(origin = "co_select", message = "close")

      # if here, it means that the full result set has been read;
      # the finally clause will return the residual (i.e. out of the yield statement above) rows;

   except dmException as dme:
      show(LOG_LEVEL.error, dme)
      show(LOG_LEVEL.error, dmAPIGet(f"getmessage,{session}").rstrip())
   except Exception as e:
      show(LOG_LEVEL.error, "exception in co_select():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
   finally:
      # close the collection;
      try:
         show(LOG_LEVEL.debug, "exiting co_select()")
         dmAPIExec(f"close,{session},{query_id}")
      except Exception as e:
         pass
      return results
      # for some unknown reason, an exception is raised on returning ...;
      # let the caller handle it;

def select_to_dict(session, dql_stmt, numbering = False):
   """
   new version of the former select2dict();
   execute in session session the DQL SELECT statement passed in dql_stmt and return the result set into an array of dictionaries;
   as the whole result set will be held in memory, be sure it is really necessary and rather use the more efficient co_select();
   """
   result = co_select(session, dql_stmt, numbering)
   return [row for row in result]

def result_to_stdout(result, format = "table", column_width = 20, mode = "wrap", frame = True, fg_color = "BLACK", bg_color = "white", alt_period = 5, col_mode = 2, numbering = False):
   """
      print the list of dictionaries result into a table with column_width-wide columns and optional wrap-around and frame;
      result can be a generator from co_select() or an array of dictionaries;
      the output is like from idql only more readable with column wrap-around if values are too wide;
      if frame is True, a frame identical to the one from mysql/postgresql is drawn around the table;
      in order to increase readability, rows can be colorized by specifying a foreground and a background colors;
      alt_period is the number of rows to print in fg_color/bg_color before changing to bg_color/fg_color;
      if col_mode is:
         0: no colorization is applied;
         1: text color alternates between fg/bg and bg/fg every alt_period row blocks;
         2: alt_period row blocks are colorized 1st line fg/bg and the rest bg/fg
      color naming is different of termcolor's; we use the following convention which is later converted to termcolor's:
      bright text colors (does not apply to background color) are identified by the uppercase strings: "BLACK", "RED", "GREEN", "YELLOW", "BLUE", "MAGENTA", "CYAN", "WHITE";
      normal intensity colors are identified by the capitalized lowercase strings: "Black", "Red", "Green", "Yellow", "Blue", "Magenta", "Cyan", "White";
      dim intensity colors are identified by the lowercase strings: "black", "red", "green", "yellow", "blue", "magenta", "cyan", "white";
      if numbering is True and a tabular format is chosen, a column holding the row number is prependended to the table;
   """

   # let's use the termcolor package wrapper around the ANSI color escape sequences;
   from copy import deepcopy
   from termcolor import colored, cprint

   if fg_color[0].isupper() and fg_color[1:].islower():
      # capitalized name: normal intensity;
      fg_color = fg_color.lower()
      attr = []
   elif fg_color.islower():
      # all lowercase name: dim intensity;
      attr = ["dark"]
   elif fg_color.isupper():
      # all uppercase name: bright intensity;
      attr = ["bold"]
      fg_color = fg_color.lower()
   else:
      show(LOG_LEVEL.error, f"unsupported color {fg_color}; it must either be all uppercase or all lowercase or capitalized lowercase")
   if bg_color.isupper():
      bg_color = bg_color.lower()
   elif not bg_color.islower():
      show(LOG_LEVEL.error, f"unsupported color {bg_color}; it must either be all uppercase or all lowercase")

   # remap black to termcolor's grey;
   if "black" == fg_color:
      fg_color = "grey"
   if "black" == bg_color:
      bg_color = "grey"

   bg_color = "on_" + bg_color
   color_current_block = 0

   max_counter_digits = 7

   def colorization(index):
      nonlocal color_current_block, ind
      if 0 == col_mode:
         return "", "", []
      elif 1 == col_mode:
         #1: fg/bg every alt_period rows then switch to bg/fg for alt_period rows, then back again;
         if 0 == index % alt_period: 
            color_current_block = (color_current_block + 1) % 2
         return fg_color, bg_color, attr + ["reverse"] if 0 == color_current_block % 2 else attr
      else:
         #2: fg/bg as first line of every alt_period rows, then bg/fg;
         return fg_color, bg_color, attr if 0 == index % alt_period else attr + ["reverse"]

   def rows_to_stdout(rows, no_color = False):
      """
         print the list of dictionaries in rows in tabular format using the parent function's parameters;
         the first column hold the row number; we don't expect more than 10^max_counter_digits - 1 rows; if more and numbering is True, the table will look distorted, just increase max_counter_digits;
      """
      btruncate = "truncate" == mode
      ellipsis = "..."
      for i, row in enumerate(rows):
         # preserve the original data as they may be referenced elsewhere;
         row = deepcopy(row)
         # hack to keep history of printed rows...;
         col_fg, col_bg, col_attr = colorization(max(ind,i)) if 0 != col_mode and not no_color else ("white", "on_grey", [])
         while True:
            left_over = ""
            line = ""
            nb_fields = len(row)
            pos = 0
            for k,v in row.items():
               nb_fields -= 1
               Min = max(column_width, len(ellipsis)) if btruncate else column_width

               # extract the next piece of the column and pad it with blanks to fill the width if needed;
               if isinstance(v, list):
                  # process repeating attributes;
                  columnS = "{: <{width}}".format(v[0][:Min] if v else "", width = column_width if not (0 == pos and numbering) else max_counter_digits)
                  restColumn = btruncate and v and len(v[0]) > Min
               else:
                  columnS = "{: <{width}}".format(v[:Min], width = column_width if not (0 == pos and numbering) else max_counter_digits)
                  restColumn = btruncate and v and len(v) > Min
               if restColumn:
                  columnS = columnS[ : len(columnS) - len(ellipsis)] + ellipsis

               # cell content colored only vs. the whole line;
               #line += ("|  " if frame else "") + colored(columnS, col_fg, col_bg, col_attr) + ("  " if frame else ("  " if nb_fields > 0 else ""))
               line += colored(("|  " if frame else "") + columnS + ("  " if frame or nb_fields > 0 else ""), col_fg, col_bg, col_attr)

               if isinstance(v, list):
                  # process repeating attributes;
                  restS = v[0][Min : ] if v else ""
                  if restS:
                     v[0] = restS
                  elif v:
                     # next repeating value;
                     v.pop(0)
                     restS = v[0] if v else ""
               else:
                  restS = v[Min : ]
                  row[k] = v[Min : ]
               left_over += "{: <{width}}".format(restS, width = column_width if not (0 == pos and numbering) else max_counter_digits)
               pos += 1
            # cell content colored only vs. the whole line;
            #print(line + ("|" if frame else ""))
            print(line + colored("|" if frame else "", col_fg, col_bg, col_attr))
            left_over = left_over.rstrip(" ")
            if not left_over or btruncate:
               break

   def print_frame_line(nb_columns, column_width = 20):
      line = ""
      while nb_columns > 0:
         line += "+" + "{:-<{width}}".format('', width = (column_width if not (1 == nb_columns and numbering) else max_counter_digits) + 2 + 2)
         nb_columns -= 1
      line += "+"
      print(line)
      return line

   # result_to_stdout;
   try:
      if not format:
         # no output is requested;
         return
      if "json" != format and "table" != format:
         raise dmException(origin = "result_to_stdout", message = "format must be either json or table")
      if "wrap" != mode and "truncate" != mode:
         raise dmException(origin = "result_to_stdout", message = "invalid mode; mode must be either wrap or truncate")
      if "json" == format:
         for r in result:
            print(json.dumps(r, indent = 3))
      else:
         for ind, r in enumerate(result):
            # print the rows in result set or list one at a time;
            if 0 == ind:
               # print the column headers once;
               # print the frame's top line;
               frame_line = print_frame_line(len(r[0]), column_width)
               rows_to_stdout([{k:k for k,v in r[0].items()}], no_color = True)
               print(frame_line)
            rows_to_stdout(r)
         # print the frame's bottom line;
         print(frame_line)
   except dmException as dme:
      show(LOG_LEVEL.error, dme)

def select_to_stdout(session, dql_stmt, format = "table", column_width = 20, mode = "wrap", frame = True, fg_color = "BLACK", bg_color = "white", alt_period = 5, col_mode = 2, numbering = False):
   """
   execute in session session the DQL SELECT statement passed in dql_stmt and sends the properly formatted result to stdout;
   if format == "json", json.dumps() is invoked for each document;
   if format == "table", document is output in tabular format;
   """
   result = co_select(session, dql_stmt, numbering)
   result_to_stdout(result, format, column_width, mode, frame, fg_color, bg_color, alt_period, col_mode, numbering)

def paginate(cursor, initial_page_size, max_cache_size = MAX_CACHE_SIZE):
   """
   Takes the generator cursor and returns a closure handle that allows to move forwards and backwards in the result set it is bound to;
   a closure is used here so a context is preserved between calls (an alternate implementation could use a co-routine or a class);
   returns None if the result set is empty;
   rows are returned as an array of dictionaries;
   i.e. if the page size (in rows) is negative, the cursor goes back that many rows, otherwise it moves forwards;
   pages can be resized by passing a new page_size to the handle;
   use a page size of 0 to close the cursor;
   Usage:
          cursor = co_select(session, dql_stmt)
          handle = paginate(cursor, max_cache_size = 1000)
          # paginate forwards 50 rows:
          handle(50)
          # paginate backwards 50 rows:
          handle(-50)
          # change page_size to 50 rows while moving forward 20 rows;
          handle(20, 50)
          # close cursor;
          handle(0)
          cursor.send(0)
   the rows from the result set that have been fetched so far are kept in cache so that they can be returned when paginating back;
   the cache is automatically extended when paginating forwards; it is never emptied so it can be heavy on memory if the result set is very large and the forwards pagination goes very far into it;
   the cache has a settable max_cache_size limit with default MAX_CACHE_SIZE;
   """
   cache = []
   # current cache'size in rows;
   cache_size = 0

   # initialize current_page_size, it can change later;
   current_page_size = initial_page_size

   # index in cached result_set of first and last rows in page;
   current_top = current_bottom = -1

   # start the generator;
   # one row will be in the cache before even starting paginating and this is taken into account later;
   cache = next(cursor)
   if cache:
      current_top = current_bottom = 0
      cache_size = 1
   else:
      return None

   def move_window(increment, page_size = None):
      nonlocal cache, cache_size, current_top, current_bottom, current_page_size
      if page_size is None:
         # work-around the default parameter value being fixed at definition time...
         page_size = current_page_size
      # save the new page size in case it has changed;
      current_page_size = page_size
      if increment > 0:
         # forwards pagination;
         if current_bottom + increment + 1 > cache_size:
            # "page fault": must fetch the missing rows to complete the requested page size;
            if current_bottom + increment > max_cache_size:
               # the cache size limit has been reached;
               # note that the above formula does not always reflect reality, i.e. if less rows are returned that asked for because the result set's end has been reached;
               # in such cases, page_size will be adjusted to fit max_cache_size;
               show(LOG_LEVEL.info, f"in cache_logic, maximum allowed cache size of {max_cache_size} reached")
               increment = max_cache_size - current_bottom
            delta = increment if cache_size > 1 else increment - 1 # because of the starting one row in cache;
            cache += cursor.send(delta)
            cache_size += delta # len(cache)
            current_bottom += delta
         else:
            current_bottom += increment
         current_top = max(0, current_bottom - page_size + 1)
         return cache[current_top : current_bottom + 1]
      elif increment < 0:
         # backwards pagination;
         increment = abs(increment)
         current_top = max(0, current_top - increment)
         current_bottom = min(cache_size, current_top + page_size) - 1
         return cache[current_top : current_bottom + 1]
      else:
         # increment is 0: close the generator;
         # must trap the strange exception after the send();
         try:
            cursor.send(0)
         except:
            pass
         return None
   return move_window

def paginate_to_stdout(session, dql_stmt, page_size = 20, format = "table", column_width = 20, mode = "wrap", frame = True, fg_color = "BLACK", bg_color = "white", alt_period = 5, col_mode = 2, numbering = False):
   """
      execute the dql statement dql_stmt in session session and output the result set in json or table format; if a tabular format is chosen, page_size is the maximum number of rows displayed at once;
      returns a handle to request the next pages or navigate backwards;
      example of usage:
              h = paginate_to_stdout(s, "select r_object_id, object_name, r_version_label from dm_document")
              if h:
                 # start the generator;
                 next(h)
                 # navigate the result set;
                 # paginate forwards 10 rows;
                 h.send(10)
                 # paginate forwards 20 rows;
                 h.send(20)
                 # paginate backwards 15 rows;
                 h.send(-15)
                 # close the handle; 
                 h.send(0)

   """
   try:
      q = co_select(session, dql_stmt, numbering)
      if not q:
         return None
      handle = paginate(q, page_size)
      while True:
         values = yield handle
         nb_rows = values[0] if isinstance(values, tuple) else values
         new_page_size = values[1] if isinstance(values, tuple) and len(values) > 1 else None
         if new_page_size:
            page_size = new_page_size
         if nb_rows is None:
            # default value is 1;
            nb_rows = 1
         if 0 == nb_rows:
            # exit request;
            break
         result_to_stdout([handle(nb_rows, page_size)], format, column_width, mode, frame, fg_color, bg_color, alt_period, col_mode, numbering)
   except Exception as e:
      show(LOG_LEVEL.error, e)

def describe(session, dm_type, is_type = True, format = "table", column_width = 20, mode = "wrap", frame = True, fg_color = "WHITE", bg_color = "BLACK", alt_period = 5, col_mode = 2):
   """
   describe dm_type as a type if is_type is True, as a registered table otherwise;
   optionally displays the output into a table or json if format is not None;
   returns the output of api's describe verb or None if an error occured;
   """
   show(LOG_LEVEL.debug, f"in describe(), dm_type={dm_type}")
   try:
      dump_str = dmAPIGet(f"describe,{session},{'type' if is_type else 'table'},{dm_type}")
      if dump_str is None:
         raise dmException(origin = "describe()", message = f"bad parameter {dm_type}")
      s = [{"attribute": l[0], "type": l[1]} for l in [i.split() for i in dump_str.split("\n")[5:-1]]]
      if format:
         result_to_stdout([s], format, column_width, mode, frame, fg_color, bg_color, alt_period, col_mode)
   except dmException as dme:
      show(LOG_LEVEL.error, dme)
      show(LOG_LEVEL.error, dmAPIGet(f"getmessage,{session}").rstrip())
   finally:
      show(LOG_LEVEL.debug, "exiting describe()")
      return dump_str

def disconnect(session):
   """
   closes the given session;
   returns True if no error, False otherwise;
   """
   show(LOG_LEVEL.debug, "in disconnect()")
   try:
      status = dmAPIExec("disconnect," + session)
   except Exception as e:
      show(LOG_LEVEL.error, "exception in disconnect():")
      show(LOG_LEVEL.error, e)
      if LOG_LEVEL.log_level > LOG_LEVEL.error: traceback.print_stack()
      status = False
   finally:
      show(LOG_LEVEL.debug, "exiting disconnect()")
      return status

# call module initialization;
dmInit()
Some comments

A few comments are in order. I´ll skip the ctypes part because it was already presented in the original blog.
On line 39, class LOG_LEVEL is being defined to encapsulate the verbosity levels, and the current one, of the error messages. Levels are inclusive of lesser ones; set LOG_LEVEL.log_level to LOG_LEVEL.no_log to turn off error messages. Default verbosity level is error, which means that only error messages are output, not debugging messages such as on function entry and exit.
On line 55, class dmException defines the custom exception used to raise Documentum errors, e.g. on lines 189 and 190. The linked-in C library libdmcl.so does not raise exceptions, their calls just return a TRUE or FALSE status (non-zero or zero value). The interface remaps those values to True or False, or sometimes None. Default exception Exception is still handled (e.g. on lines 83 and 92), more so for uniformity reason rather than out of real necessity, although it cannot be totally excluded that ctypes can raise some exception of it own under some circumstances. else and finally clauses are frequently used to remap the status or result value, return it, and cleaning up. Line 64 defines how the custom exception will be printed: it simply prints its instanciation parameters.
One line 235, function co_select() is defined. This is really the main function of the whole interface. Its purpose is to execute a SELECT DQL statement and return the rows on-demand, rather than into one potentially large in-memory list of dictionaries (reminder: pythons lists are respectively equivalent to arrays, and dictionaries to records or hashes, or associative arrays in other languages). On line 316, the yield statement makes this possible; it is this statement that turns a traditional, unsuspecting function into a generator or coroutine (this distinction is really python stuff, conceptually the function is a coroutine). Here, yield works both ways: it returns a row, which makes the function a generator, but can also optionally accept a number of rows to return at once, and 0 to stop the generator, which makes it a coroutine. On line 341, the exception handler´s finally clause closes the collection and, on line 348, returns the residual rows that were fetched but not returned yet because the end of the collection was reached and the yield statement was not executed.
One of the biggest pros of generators, in addition to saving memory, is to separate the navigation into the result set from the processing of the received data. Low-level, dirty technical details are therefore segregated into their own function out of the way of high-level data processing, resulting in a clearer and less distracting code.
Note the function’s numbering parameter: when True, returned rows are numbered starting at 1. It looks like this feature was not really necessary because a SELECT statement could just include a (pseudo-)column such as ROWNUM (for Oracle RDBMS) or a sequence, that would be treated as any other column but things are not so easy. Interfacing a sequence to a registered table, and resetting it before usage, is possible but quite complicated and needs to be done at the database level, which causes it to be not portable; besides, gaps in the sequence were observed, even when nocache was specified.
One line 352, the function select_to_dict() is defined for those cases where it still makes sense to hold a whole result set in memory at once. It does almost nothing, as the bulk of the work is done by co_select(). Line 359 executes a list comprehension that takes the generator returned by co_select() and forces it to be iterated until it meets its stop condition.
Skipping to line 519, function select_to_stdout() is another application of co_select(). This time, the received generator is passed to function result_to_stdout() defined on line 361; this function exemplifies outputting the data in a useful manner: it displays them to stdout either in json through the imported json library, or in tabular format. It can be used elsewhere each time such a presentation is sensible, e.g. from function describe() below, just make sure that the data are passed as a singleton list of a list of dictionaries (i.e. a list whose sole element is a list of dictionaries).
There isn’t much to add about the well-known json format (see an example below) but the tabular presentation is quite rich in functionalities. It implements in python what was presented here and here with the addition of color; the driving goal was to get a readable and comfortable table output containing documents as rows and their attributes as columns. Interactivity can be achieved by piping the output of the function into the less utility, as illustrated below:

$ pip install termcolor
$ export PYTHONPATH=/home/dmadmin/dctm-DctmAPI:/home/dmadmin/.local/lib/python3.5/site-packages
$ cat - < test-table.py 
#!/usr/bin/python3.6
import DctmAPI

s = DctmAPI.connect("dmtest73", "dmadmin", "dmadmin")
DctmAPI.select_to_stdout(s, "select r_object_id, object_name, title, owner_name, subject, r_version_label from dm_document", format = "table", column_width = 30, mode = "wrap", frame = True, fg_color = "YELLOW", bg_color = "BLACK", alt_period = 5, col_mode = 2)
eot
$ chmod +x test-table.py
$ ./test-table.py | less -R

Result:

Here, a tabular (format = “table”, use format = “json” for json output) representation of the data returned by the DQL statement has been requested with 30 character-wide columns (column_width = 30); attributes too large to fit in their column are wrapped around; they could have been truncated by setting mode = “truncate”. A frame à la mysql or postgresql has been requested with frame = True. Rows colorization has been requested with the first line every 5 rows (alt_period = 5) in reverse color yellow on black and the others in black on yellow (col_mode = 2; use col_mode = 1 for alt_period lines large alternating colored fg/bg bg/fg blocks, and col_mode = 0 for no colorization).
The simple but very effective termcolor ANSI library is used here, which is a real relief compared to having to reimplement one myself for the 2nd or 3rd time in my life…
Note the use of the less command with the -R option so ANSI color escape sequences are passed through to the terminal and correctly rendered.
As a by-product, let’s generalize the snippet above into an independent, reusable utility:

$ cat test-table.py
#!/usr/bin/python3.6 
import argparse
import DctmAPI
                
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--docbase', action='store',
                        default='dmtest73', type=str,
                        nargs='?',
                        help='repository name [default: dmtest73]')
    parser.add_argument('-u', '--user_name', action='store',
                        default='dmadmin',
                        nargs='?',
                        help='user name [default: dmadmin]')
    parser.add_argument('-p', '--password', action='store',
                        default='dmadmin',
                        nargs='?',
                        help='user password [default: "dmadmin"]')
    parser.add_argument('-q', '--dql_stmt', action='store',
                        nargs='?',
                        help='DQL SELECT statement')
    args = parser.parse_args()
            
    session = DctmAPI.connect(args.docbase, args.user_name, args.password)
    if session is None:
       print(f"no session opened in docbase {args.docbase} as user {args.user_name}, exiting ...")
       exit(1)

    DctmAPI.select_to_stdout(session, args.dql_stmt, format = "table", column_width = 30, mode = "wrap", frame = True, fg_color = "YELLOW", bg_color = "BLACK", alt_period = 5, col_mode = 2)

# make it self-executable;
$ chmod +x test-table.py

# test it !
$ ./test-table.py -q "select r_object_id, object_name, title, owner_name, subject, r_version_label from dm_document" | less -R

# ship it !
# nah, kidding.

For completeness, here is an example of a json output:

s = DctmAPI.connect("dmtest73", "dmadmin", "dmadmin")
DctmAPI.select_to_stdout(s, "select r_object_id, object_name, r_version_label from dm_document", format = "json")
[
   {
      "r_object_id": "0900c350800001d0",
      "object_name": "Default Signature Page Template",
      "r_version_label": [
         "CURRENT",
         "1.0"
      ]
   }
]
...
[
   {
      "r_object_id": "0900c350800001da",
      "object_name": "Blank PowerPoint Pre-3.0 Presentation",
      "r_version_label": [
         "CURRENT",
         "1.0"
      ]
   }
]

Note the embedded list for the repeating attribute r_version_label; unlike relational tables, the json format suits perfectly well documents from repositories. It is not ready to support Documentum’s object-relational model but it is close enough. Maybe one day, once hell has frozen over -;), we’ll see a NOSQL implementation of Documentum, but I digress.
Back to the code, on line 528 function paginate() is defined. This function allows to navigate a result set forwards and backwards into a table; the latter is possible by caching (more exactly, saving, as the data are cumulative and never replaced), the rows received so far. As parameters, it takes a cursor for the opened collection, a page size and the maximum cache size. In order to preserve its context, e.g. the cache and the pointers to the first and last rows displayed from the result set, the function’s chosen implementation is that of a closure, with the inner function move_window() returned to the caller as a handle. Alternative implementations could be a class or a co-routine again. move_windows() requests the rows from the cursor via send(nb_rows) as previously explained and returns them as a list. A negative nb_rows means to navigate backwards, i.e. the requested rows are returned from the cache instead of the cursor. Obviously, as the cache is dynamically extended up to the specified size and its content never released to make room for the new rows, if one paginates to the bottom of a very large result set, a lot of memory can still be consumed because the whole result set finishes up in memory. A more conservative implementation could get rid of older rows to accomodate the new ones but at the cost of a reduced history depth, so it’s a trade-off; anyway, this subject is out of scope.
As its usage protocol may not by that simple at first, an example function paginate_to_stdout() is defined as a co-routine starting on line 613, with the same parameters as in select_to_stdout(). It can be used as follows:

# connect to the repository;
s = DctmAPI.connect("dmtest73", "dmadmin", "dmadmin")

# demonstration of DctmAPI.paginate_to_stdout();
# request a pagination handle to the result set returned for the SELECT dql query below;
h = DctmAPI.paginate_to_stdout(s, "select r_object_id, object_name, title, owner_name, subject, r_version_label from dm_document", page_size = 5, format = "table", column_width = 30, mode = "wrap", frame = True, fg_color = "RED", bg_color = "black", alt_period = 5, col_mode = 1, numbering = True)  

print("starting the generator")
next(h)

nb_rows = 3
print(f"\nnext {nb_rows} rows")
h.send(nb_rows)

nb_rows = 10
print(f"\nnext {nb_rows} rows")
h.send(nb_rows)

nb_rows = 5
print(f"\nnext {nb_rows} rows and page_size incremented to 10")
h.send((nb_rows, 10))

nb_rows = 10
print(f"\nnext {nb_rows} row")
h.send(nb_rows)

nb_rows = -4 
print(f"\nprevious {nb_rows} rows")
h.send(nb_rows)

nb_rows = 12 
print(f"\nnext {nb_rows} rows and page_size decremented to 6")
h.send((nb_rows, 6))

nb_rows = -10 
print(f"\nprevious {nb_rows} rows")
h.send(nb_rows)

print(f"exiting ...")
try:
   h.send(0)
except:
   # trap the StopIteration exception;
   pass
sys.exit()

Here, each call to send() results in a table being displayed with the requested rows, as illustrated below:


Note how send() takes either a scalar or a tuple as parameter; when the page size needs to be changed, a tuple including the new page size is passed to the closure which processes it to extract its values (line 640 and 641). It is a bit convoluted but it is a limitation of the send() function: as it takes only one parameter, they must be packed into a collection if they are more than one.
The snippet above could be generalized to a stand-alone interactive program that reads from the keyboard a number of rows as an offset to move backwards or forwards, if saving the whole result set into a disk file is too expensive and only a few pages are requested, but DQL has the limiting clause enable(return_top N) for this purpose. so such an utility is not really useful.
On line 654, the describe() function returns as-is the result of the eponymous api verb, i.e. as a raw string with each item delimited by an end of line character (‘\n’character under Linux) for further processing by the caller; optionally, it can also output it as a table or as a json literal by taking profit of the function result_to_stdout() and passing it the data that were appropriately formatted on line 667 as a list of one list of dictionaries.
Here are two examples of outputs.

s = DctmAPI.connect("dmtest73", "dmadmin", "dmadmin")
desc_str = DctmAPI.describe(s, "dm_document", format = "json")
# json format:

s = DctmAPI.connect("dmtest73", "dmadmin", "dmadmin")
desc_str = DctmAPI.describe(s, "dm_document")
# Tabular format:


Finally, on line 693, the module is automatically initialized at load time.

Conclusion

The python language has quite evolved from v2 to v3, the latest as of this writing being 3.9. Each version brings a few small, visible enhancements; an example of which are the formatting f’strings (no pun intended), which were used here. Unfortunately, they need python 3.6 minimum, which breaks compatibility with previous releases; fortunately, they can be easily replaced with older syntax alternatives if need be.
As usual, the DctmAPI does not pretend to be the best python interface to Documentum ever. It has been summarily tested and bugs could still be lurking around. I know, there are lots of improvements and functionalities possible, e.g. displaying acls and users and groups, maybe wrapping the module into classes, using more pythonic constructs, to name but a few. So, feel free to add your comments, corrections and suggestions below. They will all be taken into consideration and maybe implemented too if interesting enough. In the meantime, take care of yourself and your family. Happy New Year to everyone !

Cet article DctmAPI.py revisited est apparu en premier sur Blog dbi services.

Migrate Oracle Database 9.2.0.6 to Oracle 19c using GoldenGate

Wed, 2020-12-30 07:42

When a customer wanted to take the challenge to migrate an oracle database 9.2.0.6 (the prehistory in the Oracle world) to Oracle 19c using Oracle GodenGate, I saw more problems than add value for different reasons:

  •  Oracle 9.2.0.6 database is out of support (final 9.2 patch was Oracle 9.2.0.8).
  • The customer Operating Systems was AIX 7.4 and only Oracle GoldenGate 11.1.1.1.2 for Oracle 9.2 for AIX 5.3 is available on https://edelivery.oracle.com.
  • The Patch 13606038: ORACLE GOLDENGATE V11.1.1.0.31 FOR ORACLE 9I is not available for download since we need special support to got it.

Oracle GoldenGate database Schema Profile check script

The first step is to download from Oracle Support, the Oracle GoldenGate database Schema Profile check script to query the database by schema to identify current configuration and any unsupported data types or types that may need special considerations for Oracle GoldenGate in an oracle environment:

  • Oracle GoldenGate database Schema Profile check script for Oracle DB (Doc ID 1296168.1) : full-schemaCheckOracle_07072020.sql

Even Oracle Support mentions that this script is written for Oracle database version 9i thru 11g, some adaptation must be done for an Oracle 9.2.0.6 database:

First of all, add a parameter to specify schema name as entry :

oracle@aixSourceServer-Ora9i: /opt/oracle/software/goldengate> ls -ltr

vi full-schemaCheckOracle_07072020.sql

--Lazhar Felahi – 10.12.2020 - comment this line
--spool schemaCheckOracle.&&schema_name.out
--Lazhar Felahi – 10.12.2020 - comment this line
--ACCEPT schema_name char prompt 'Enter the Schema Name > '
variable b0 varchar2(50)
--Lazhar Felahi – 10.12.2020 - comment this line
--exec :b0 := upper('&schema_name');
--Lazhar Felahi – 10.12.2020 - add this line
exec :b0 := '&1';
--Lazhar Felahi – 10.12.2020 - comment this line
--spool schemaCheckOracle.&&schema_name..out
--Lazhar Felahi – 10.12.2020 - add this line
spool schemaCheckOracle.&&1..out

Execute the script for schemas needed:

oracle@aixSourceServer-Ora9i: /opt/oracle/software/goldengate>
sqlplus /nolog
SQL*Plus: Release 9.2.0.6.0 - Production on Thu Dec 10 21:19:37 2020
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> start full-schemaCheckOracle_07072020.sql HR
error :
ERROR at line 4:
ORA-00904: "SUPPLEMENTAL_LOG_DATA_ALL": invalid identifier
platform_name
*
ERROR at line 2:
ORA-00904: "PLATFORM_NAME": invalid identifier
------ Integrated Extract unsupported objects in HR

select object_name, support_mode from DBA_GOLDENGATE_SUPPORT_MODE WHERE OWNER = :b0 and support_mode = 'NONE'
ERROR at line 1:
ORA-00942: table or view does not exist

 

The above errors can be ignored :

  • The errors ORA-00904: “SUPPLEMENTAL_LOG_DATA_ALL”: invalid identifier and ORA-00904: “PLATFORM_NAME”: invalid identifier can be ignored since this column does not exist into the data dictionary view v$database for the version Oracle 9.2.0.6 database.
  • The error ORA-00942: table or view does not exist can be ignored since the view DBA_GOLDENGATE_SUPPORT_MODE  is available starting with Oracle Database 11g Release 2 (11.2.0.4).

Adapt the script and re-execute it, an output file is generated listing different checks and any types unsupported.

For instance, the script lists all tables with no primary key or Unique Index or Tables with NOLOGGING setting.

GoldenGate needs tables with primary key. If no PK exist for one table, GG will take all column to define the unicity.

GOLDENGATE INSTALLATION – ON SOURCE SERVER

Download the zip file corresponding to Oracle GoldenGate 11.1.1.1.2 software from https://edelivery.oracle.com :

  • V28955-01.zip Oracle GoldenGate 11.1.1.1.2 for Oracle 9.2 for AIX 5.3 on IBM AIX on POWER Systems (64-bit), 45.5 MB

Unzip and untar the file:

oracle@aixSourceServer-Ora9i: /opt/oracle/software/goldengate> ls -ltr
total 365456
-rw-rw-r--    1 oracle   dba       139079680 Oct  5 2011  ggs_AIX_ppc_ora9.2_64bit.tar
-rw-r--r--    1 oracle   dba          245329 Oct 28 2011  OGG_WinUnix_Rel_Notes_11.1.1.1.2.pdf
-rw-r--r--    1 oracle   dba           25065 Oct 28 2011  Oracle GoldenGate 11.1.1.1 README.txt
-rwx------    1 oracle   dba        47749729 Dec 10 13:55 V28955-01.zip
drwxr-xr-x    2 oracle   dba            4096 Dec 14 09:35 check_script
oracle@aixSourceServer-Ora9i:/opt/oracle/software/goldengate>

oracle@aixSourceServer-Ora9i:/opt/oracle/product/gg_11.1.1.1.2> tar -xvf /opt/oracle/software/goldengate/ggs_AIX_ppc_ora9.2_64bit.tar
…
x marker_setup.sql, 3702 bytes, 8 tape blocks
x marker_status.sql, 1715 bytes, 4 tape blocks
oracle@aixSourceServer-Ora9i: /opt/oracle/product/gg_11.1.1.1.2> ls -ltr
total 235344
-r--r--r-- 1 oracle dba 1476 Oct 15 2010 zlib.txt
. . .
-rwxr-xr-x 1 oracle dba 13911955 Oct 5 2011 replicat

Let’s set the LIBPATH environment variable and call “ggsci” utility:

oracle@aixSourceServer-Ora9i: /opt/oracle/product/gg_11.1.1.1.2> export LIBPATH=/opt/oracle/product/gg_11.1.1.1.2/:$ORACLE_HOME/lib:/opt/oracle/product/9.2.0.6/lib32/:/opt/oracle/product/9.2.0.6/lib/:$LIBPATH
oracle@aixSourceServer-Ora9i: /opt/oracle/product/gg_11.1.1.1.2> ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
AIX 5L, ppc, 64bit (optimized), Oracle 9.2 on Oct 5 2011 00:37:06

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.

GGSCI (aixSourceServer-Ora9i) 1> info all

Program Status Group Lag Time Since Chkpt

MANAGER STOPPED

GOLDENGATE SETUP – ON SOURCE SERVER

Create the goldengate admin user on source and target database:

oracle@aixSourceServer-Ora9i:/home/oracle/ [DB2] sqlplus / as sysdba

SQL> create tablespace GOLDENGATE datafile '/u02/oradata/DB2/goldengate.dbf' size 2G ;

SQL> create profile GGADMIN limit password_life_time unlimited ;

SQL> create user GGADMIN identified by "******" default tablespace
     goldengate temporary tablespace temp profile GGADMIN ;

SQL> grant create session, dba to GGADMIN ;

SQL> exec dbms_goldengate_auth.grant_admin_privilege ('GGADMIN') ;

SQL> grant flashback any table to GGADMIN ;
--create subdirs
GGSCI (aixSourceServer-Ora9i) 1> create subdirs

Creating subdirectories under current directory /opt/oracle/product/gg_11.1.1.1.2

Parameter files                /opt/oracle/product/gg_11.1.1.1.2/dirprm: created
Report files                   /opt/oracle/product/gg_11.1.1.1.2/dirrpt: created
Checkpoint files               /opt/oracle/product/gg_11.1.1.1.2/dirchk: created
Process status files           /opt/oracle/product/gg_11.1.1.1.2/dirpcs: created
SQL script files               /opt/oracle/product/gg_11.1.1.1.2/dirsql: created
Database definitions files     /opt/oracle/product/gg_11.1.1.1.2/dirdef: created
Extract data files             /opt/oracle/product/gg_11.1.1.1.2/dirdat: created
Temporary files                /opt/oracle/product/gg_11.1.1.1.2/dirtmp: created
Veridata files                 /opt/oracle/product/gg_11.1.1.1.2/dirver: created
Veridata Lock files            /opt/oracle/product/gg_11.1.1.1.2/dirver/lock: created
Veridata Out-Of-Sync files     /opt/oracle/product/gg_11.1.1.1.2/dirver/oos: created
Veridata Out-Of-Sync XML files /opt/oracle/product/gg_11.1.1.1.2/dirver/oosxml: created
Veridata Parameter files       /opt/oracle/product/gg_11.1.1.1.2/dirver/params: created
Veridata Report files          /opt/oracle/product/gg_11.1.1.1.2/dirver/report: created
Veridata Status files          /opt/oracle/product/gg_11.1.1.1.2/dirver/status: created
Veridata Trace files           /opt/oracle/product/gg_11.1.1.1.2/dirver/trace: created
Stdout files                   /opt/oracle/product/gg_11.1.1.1.2/dirout: created

--add GGSCHEMA into ./GLOBALS file in source and target
oracle@ aixSourceServer-Ora9i: /opt/oracle/product/gg_11.1.1.1.2> ./ggsci
GGSCI (aixSourceServer-Ora9i) 3> view param ./GLOBALS
GGSCHEMA goldengate

--add PORT into mgr parameter file and start the manager
GGSCI (aixSourceServer-Ora9i) 1> edit params mgr

PORT 7809
GGSCI (aixSourceServer-Ora9i) 6> start mgr
Manager started.
GGSCI (aixSourceServer-Ora9i) 7> info all
Program     Status      Group       Lag           Time Since Chkpt
MANAGER     RUNNING

--Installing the DDL support on the source database : You will be prompted for the name of a schema for the GoldenGate database objects.
SQL> @marker_setup.sql
. . .
Script complete
SQL> @ddl_setup.sql
. . .
SUCCESSFUL installation of DDL Replication software components
SQL> @role_setup.sql
Role setup script complete
SQL> grant ggs_ggsuser_role to goldengate;
SQL> @ddl_enable.sql
Trigger altered
--On both database (source and target), Installing Support for Sequences
SQL> @sequence.
. . .
SUCCESSFUL installation of Oracle Sequence Replication support

Add the trandata on schemas concerned by the GoldenGate replication:

oracle@aixSourceServer-Ora9i: /opt/oracle/product/gg_11.1.1.1.2> ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
AIX 5L, ppc, 64bit (optimized), Oracle 9.2 on Oct  5 2011 00:37:06

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.
	


GGSCI (aixSourceServer-Ora9i) 1> dblogin userid goldengate
Password:
Successfully logged into database.


GGSCI (aixSourceServer-Ora9i) 2> add trandata bgh.*
GGSCI (aixSourceServer-Ora9i) 2> add trandata all_opi.*

2020-12-18 10:46:45  WARNING OGG-00706  Failed to add supplemental log group on table bgh.KLI_J_TEST_HIST due to ORA-02257: maximum number of columns exceeded, SQL ALTER TABLE "bgh"."KLI_J_TEST_HIST" ADD SUPPLEMENTAL LOG GROUP "GGS_KLI_J_TEST_HIST_901157" ("ENH_N_ID","ENH_N_NOINSCRIPTION","ENH_N_NOCOURS","ENH_C_P1NOTE","ENH_C_P2NOTE","ENH_C_P3NOTE","ENH_C_.


2020-12-18 10:46:52  WARNING OGG-00706  Failed to add supplemental log group on table bgh.TABLE_ELEVES_SVG due to ORA-02257: maximum number of columns exceeded, SQL ALTER TABLE "bgh"."TABLE_ELEVES" ADD SUPPLEMENTAL LOG GROUP "GGS_TABLE_ELEVES_901320" ("NOINSCRIPTION","NOCOURS","P1NOTE","P2NOTE","P3NOTE","P4NOTE","P5NOTE","P6NOTE","P7NOTE","P8NOTE","P1COMPTE".

2020-12-18 10:46:52  WARNING OGG-00869  No unique key is defined for table TABLENOTE_TMP. All viable columns will be used to represent the key, but may not guarantee uniqueness.  KEYCOLS may be used to define the key.

Logging of supplemental redo data enabled for table all_opi.ZUI_VM_RETUIO_SCOLARITE.
ERROR: OCI Error retrieving bind info for query (status = 100), SQL <SELECT * FROM "all_opi"."EXT_POI_V_RTEWR">.

The warning OGG-00706 and OGG–00869 are solved by adding a primary key to the tables concerned.

The OCI error must be investigated by opening an Oracle  Service Request.

Add the extract, exttrail and start it :

GGSCI (aixSourceServer-Ora9i) 2> add extract EXTRSO, tranlog, begin now
EXTRACT added.

add extract EXTRNA, tranlog, begin now
GGSCI (aixSourceServer-Ora9i) 7> add EXTTRAIL /opt/oracle/goldengate/data/DDIP9/so, EXTRACT EXTRSO
EXTTRAIL added.

add EXTTRAIL /opt/oracle/goldengate/data/DDIP9/na, EXTRACT EXTRNA

edit param EXTRSO
Extract EXTRSO
userid goldengate password ******
Exttrail /opt/oracle/goldengate/data/DDIP9/so
ENCRYPTTRAIL AES192
DDL INCLUDE MAPPED OBJNAME bgh.*
Table bgh.*;

edit param EXTRNA
Extract EXTRNA
userid goldengate password ******
Exttrail /opt/oracle/goldengate/data/DDIP9/na
ENCRYPTTRAIL AES192
DDL INCLUDE MAPPED OBJNAME all_opi.*
Table all_api.*;

start EXTRSO

Sending START request to MANAGER ...
EXTRACT EXTRSO starting

start EXTRNA

GGSCI (aixSourceServer-Ora9i) 11> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXTRHR      00:00:00      00:00:04
EXTRACT     RUNNING     EXTRSO      00:08:18      00:00:00
EXTRACT     RUNNING     PUMPHR      00:00:00      00:00:07

Check if trail files are created:

oracle@aixSourceServer-Ora9i: /opt/oracle/product/gg_11.1.1.1.2> ls -ltr /opt/oracle/goldengate/data/DDIP9/
total 72
-rw-rw-rw-    1 oracle   dba             960 Dec 16 09:34 hr000000
-rw-rw-rw-    1 oracle   dba            1021 Dec 16 10:25 hr000001
-rw-rw-rw-    1 oracle   dba            1021 Dec 16 10:34 hr000002
-rw-rw-rw-    1 oracle   dba            2679 Dec 16 14:26 hr000003
-rw-rw-rw-    1 oracle   dba             960 Dec 16 19:54 so000000
-rw-rw-rw-    1 oracle   dba            1021 Dec 16 19:59 na000000

Add the PUMP:

GGSCI (aixSourceServer-Ora9i) 1> add extract PUMPSO,EXTTRAILSOURCE /opt/oracle/goldengate/data/DDIP9/so
EXTRACT added.

add extract PUMPNA,EXTTRAILSOURCE /opt/oracle/goldengate/data/DDIP9/na

GGSCI (aixSourceServer-Ora9i) 2> add rmttrail /data/oradata/goldengate/data/LGGATE/so, extract PUMPSO
RMTTRAIL added.

add rmttrail /data/oradata/goldengate/data/LGGATE/na, extract PUMPNA

extract PUMPSO
userid goldengate password ******
RMTHOST aixTargetServer-Ora19c, MGRPORT 7810
RMTTRAIL /data/oradata/goldengate/data/LGGATE/so
TABLE bgh.*;

extract PUMPNA
userid goldengate password ******
RMTHOST aixTargetServer-Ora19c, MGRPORT 7810
RMTTRAIL /data/oradata/goldengate/data/LGGATE/na
TABLE all_api.*;


GGSCI (aixSourceServer-Ora9i) 6> start pumpso

Sending START request to MANAGER ...
EXTRACT PUMPSO starting

start pumpna

GGSCI (aixSourceServer-Ora9i) 26> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXTRHR      00:00:00      00:00:07
EXTRACT     RUNNING     EXTRNA      00:00:00      00:00:08
EXTRACT     RUNNING     EXTRSO      00:00:00      00:00:05
EXTRACT     RUNNING     PUMPHR      00:00:00      00:00:03
EXTRACT     RUNNING     PUMPNA      00:00:00      00:03:42
EXTRACT     RUNNING     PUMPSO      00:00:00      00:00:00

 

GOLDENGATE INITIAL LOAD

On the source schemas, got the last active transaction and do the export:

SELECT dbms_flashback.get_system_change_number as current_scn FROM DUAL;
10228186709471 --Backup this SCN, it will be used later to start the goldengate replicat process on the target server
nohup  exp / file=rg081DDIP9.s0.202012172303.dmp log=rg081DDIP9.s0.202012172303.dmp.log tables=bgh.% flashback_scn=10228186709471 &
nohup  exp / file=rg081DDIP9.nbds_adm.202012172303.dmp log=rg081DDIP9.all_opi.202012172303.dmp.log tables=nbds_adm.% flashback_scn=10228186709471 &

Copy the dump file on the target and do the import :

drop user bgh cascade;
create user bgh identified by "******" default tablespace SO temporary tablespace TEMP;
alter user bgh quota unlimited on S0_D;
alter user bgh quota unlimited on S0_I;
alter user bgh quota unlimited on S0_LOB;
nohup imp / file=/data/export/LGGATE/rg081DDIP9.s0.202012172303.dmp log=so.imp171220202303.log buffer=1000000 fromuser=bgh touser=bgh grants=n statistics=none constraints=n ignore=y &

drop user all_opi cascade;
create user all_opi identified by "******" default tablespace NA temporary tablespace TEMP;
alter user all_opi quota unlimited on NBDS_D;
alter user all_opi quota unlimited on NBDS_I;
alter user all_opi quota unlimited on NBDS_LOB;
alter user all_opi quota unlimited on system;
alter user all_opi quota unlimited on na;
nohup imp / file=/data/export/LGGATE/rg081DDIP9.nbds_adm.202012172303.dmp log=na.imp171220202303.log buffer=1000000 fromuser=all_opi touser=all_opi grants=n statistics=none constraints=n ignore=y &

Since the import is done without the constraints, get all primary key from the source database and create it into target.

Disable all triggers on the target:

select 'alter trigger '||owner||'.'||trigger_name||' disable;' from dba_triggers where owner = 'NBDS_ADM';

Check no ref. constraints exist, job_queue_processes parameter equal to 0 and recompile all:

--checK ref constraints
SQL> select * from dba_constraints where owner = 'NBDS_ADM' and constraint_type = 'R';

no rows selected

SQL>
--check job_queue_processes

SQL> sho parameter job

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes                  integer     384
max_datapump_jobs_per_pdb            string      100
max_datapump_parallel_per_job        string      50
SQL> alter system set job_queue_Processes = 0;

System altered.

SQL>

--recompile all
SQL> start ?/rdbms/admin/utlrp.sql

Session altered.
. . .

GOLDENGATE SETUP – ON TARGET SERVER

Install the last version of GoldenGate software for AIX from : https://www.oracle.com/middleware/technologies/goldengate-downloads.html.

The goldengate installation has nothing special, just read the documentation : https://docs.oracle.com/en/middleware/goldengate/core/19.1/index.html

Do the standard database setup for goldengate written into above documentation.

Under ggsci, create a wallet :

GGSCI (aixTargetServer-Ora19c) 11> create wallet

Created wallet.

Opened wallet.

GGSCI (aixTargetServer-Ora19c) 12> add credentialstore

Credential store created.

GGSCI (aixTargetServer-Ora19c) 13> alter credentialstore add user goldengate@LGGATE alias goldengate
Password:

Credential store altered.

GGSCI (aixTargetServer-Ora19c) 1> dblogin useridalias goldengate
Successfully logged into database.

GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 2>

Add the replicat:

--add replicat
GGSCI (aixTargetServer-Ora19c) 5> dblogin useridalias goldengate
Successfully logged into database.

GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 8> add replicat REPLSO, exttrail /data/oradata/goldengate/data/LGGATE/so,checkpointtable GOLDENGATE.CHECKPOINT;
REPLICAT added.


GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 9> add replicat REPLNA, exttrail /data/oradata/goldengate/data/LGGATE/na,checkpointtable GOLDENGATE.CHECKPOINT;
REPLICAT added.


GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 10> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     REPLHR      00:00:00      00:00:09
REPLICAT    STOPPED     REPLNA      00:00:00      00:00:02
REPLICAT    STOPPED     REPLSO      00:00:00      00:00:17


GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 11>

--configure replicat
Replicat REPLSO
--DBOPTIONS INTEGRATEDPARAMS (parallelism 6)
SOURCECHARSET PASSTHRU
DISCARDFILE /opt/oracle/product/gg_191004/dirrpt/REPLSO_discard.txt, append, megabytes 10
USERIDALIAS goldengate
ASSUMETARGETDEFS
MAP bhg.*,TARGET bgh.*;


Replicat REPLNA
--DBOPTIONS INTEGRATEDPARAMS (parallelism 6)
SOURCECHARSET PASSTHRU
DISCARDFILE /opt/oracle/product/gg_191004/dirrpt/REPLNA_discard.txt, append, megabytes 10
USERIDALIAS goldengate
ASSUMETARGETDEFS
MAP all_opi.*,TARGET all_opi.*;

--Start replicat REPLNA
GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 18> start replicat REPLNA, atcsn 10228186709471

Sending START request to MANAGER ...
REPLICAT REPLNA starting

--Start replicat REPLS0
GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 18> start replicat REPLSO, atcsn 10228186709471

Sending START request to MANAGER ...
REPLICAT REPLSO starting

GGSCI (aixTargetServer-Ora19c as goldengate@LGGATE) 19> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     REPLHR      00:00:00      00:00:02
REPLICAT    RUNNING     REPLNA      00:00:00      00:00:01
REPLICAT    RUNNING     REPLSO      00:00:00      00:21:10

Wait unti the lag decrease…

GOLDENGATE TEST SYNCHRONIZATION

Add some activity DML + DDL on the source database and check the synchronization with goldengate “stats” commands on both servers:

GGSCI (aixSourceServer-Ora9i) 5> stats extract EXTRNA, totalsonly *.*

Sending STATS request to EXTRACT EXTRNA ...

Start of Statistics at 2020-12-18 16:35:00.

DDL replication statistics (for all trails):

*** Total statistics since extract started     ***
        Operations                                   0.00
        Mapped operations                            0.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00

Output to /opt/oracle/goldengate/data/DDIP9/na:

Cumulative totals for specified table(s):

*** Total statistics since 2020-12-18 10:42:15 ***
        Total inserts                                8.00
        Total updates                                1.00
        Total deletes                               25.00
        Total discards                               0.00
        Total operations                            34.00

*** Daily statistics since 2020-12-18 10:42:15 ***
        Total inserts                                8.00
        Total updates                                1.00
        Total deletes                               25.00
        Total discards                               0.00
        Total operations                            34.00

*** Hourly statistics since 2020-12-18 16:00:00 ***

        No database operations have been performed.

*** Latest statistics since 2020-12-18 10:42:15 ***
        Total inserts                                8.00
        Total updates                                1.00
        Total deletes                               25.00
        Total discards                               0.00
        Total operations                            34.00

End of Statistics.

GGSCI (aixSourceServer-Ora9i) 10> stats extract EXTRSO, totalsonly *.*

Sending STATS request to EXTRACT EXTRSO ...

Start of Statistics at 2020-12-18 16:36:06.

DDL replication statistics (for all trails):

*** Total statistics since extract started     ***
        Operations                                   0.00
        Mapped operations                            0.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00

Output to /opt/oracle/goldengate/data/DDIP9/so:

Cumulative totals for specified table(s):

*** Total statistics since 2020-12-18 10:42:15 ***
        Total inserts                              156.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                           156.00

*** Daily statistics since 2020-12-18 10:42:15 ***
        Total inserts                              156.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                           156.00

*** Hourly statistics since 2020-12-18 16:00:00 ***

        No database operations have been performed.

*** Latest statistics since 2020-12-18 10:42:15 ***
        Total inserts                              156.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                           156.00

End of Statistics.

--On the target server
GGSCI (aixTargetServer-Ora19c) 5> stats replicat REPLNA, totalsonly *.*

Sending STATS request to REPLICAT REPLNA ...

Start of Statistics at 2020-12-18 16:36:45.

DDL replication statistics:

*** Total statistics since replicat started     ***
        Operations                                         1.00
        Mapped operations                                  1.00
        Unmapped operations                                0.00
        Other operations                                   0.00
        Excluded operations                                0.00
        Errors                                             0.00
        Retried errors                                     0.00
        Discarded errors                                   0.00
        Ignored errors                                     0.00

Cumulative totals for specified table(s):

*** Total statistics since 2020-12-18 11:42:12 ***
        Total inserts                                    526.00
        Total updates                                      1.00
        Total deletes                                    543.00
        Total upserts                                      0.00
        Total discards                                     0.00
        Total operations                                1070.00

*** Daily statistics since 2020-12-18 11:42:12 ***
        Total inserts                                    526.00
        Total updates                                      1.00
        Total deletes                                    543.00
        Total upserts                                      0.00
        Total discards                                     0.00
        Total operations                                1070.00

*** Hourly statistics since 2020-12-18 16:00:00 ***

        No database operations have been performed.

*** Latest statistics since 2020-12-18 11:42:12 ***
        Total inserts                                    526.00
        Total updates                                      1.00
        Total deletes                                    543.00
        Total upserts                                      0.00
        Total discards                                     0.00
        Total operations                                1070.00

End of Statistics.

 

If you want to remove your GoldenGate configuration

on source :
delete trandata hr.*
delete trandata bgh.*
delete trandata all_opi.*
drop user goldengate cascade;
SQL> @ddl_disable

on target :
SQL> drop user goldengate cascade;

User dropped.


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
enable_goldengate_replication        boolean     TRUE
resource_manage_goldengate           boolean     FALSE
SQL> alter system set enable_goldengate_replication=FALSE scope = both;

System altered.

SQL> sho parameter goldengate

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
enable_goldengate_replication        boolean     FALSE
resource_manage_goldengate           boolean     FALSE
SQL>

Conclusion

  • Synchronize an oracle database 9.2.0.6 to Oracle 19c (Oracle 19.7 in our case) with GoldenGate works !!! Of course some test with more activity as we have in the real life (production database) must be done to evaluate all possible problems.
  • Oracle does some enhancements to the Oracle GoldenGate software, we don’t need any parameter to convert the trail file format between different Oracle GoldenGate versions (as we had in the past between GG prior 10g and GG post 10g), the converison is done automatically.
  • Using GoldenGate to migrate your Oracle 9i database to Oracle 19c must be compared with alternative migration solution :
    • Transportable tablespace
    • Export/Import or Datapump
  • The focus must be done on the downtime available for the migration:
    • Less Downtime you have, Oracle Export Import, DataPump or Transportable Tablespaces will be better solution.
    • Near Zero Downtime you have, GoldenGate could be a solution only if the applicative team (architect, project manager, developer) participates since, for instance, tables without primary key will prevent GoldenGate to work, thus, developer must choose column/s to be candidate to be the PK into source.

Cet article Migrate Oracle Database 9.2.0.6 to Oracle 19c using GoldenGate est apparu en premier sur Blog dbi services.

Optimizer Statistics Gathering – pending and history

Tue, 2020-12-29 09:01
By Franck Pachot

.
This was initially posted to CERN Database blog on Wednesday, 12 September 2018 where it seems to be lost. Here is a copy thanks to web.archive.org

Demo table

I create a table for the demo. The CTAS gathers statistics (12c online statistics gathering) with one row and then I insert more rows:



10:33:56 SQL> create table DEMO as select rownum n from dual;
Table DEMO created.
10:33:56 SQL> insert into DEMO select rownum n from xmltable('1 to 41');
41 rows inserted.
10:33:56 SQL> commit;
Commit complete.

The estimations are stale: estimates 1 row (E-Rows) but 42 actual rows (A-Rows)



10:33:56 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 

10:33:57 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |      1 |     42 |   
--------------------------------------------------------------   

Pending Statistics

Here we are: I want to gather statistics on this table. But I will lower all risks by not publishing them immediately. Current statistics preferences are set to PUBLISH=TRUE:



10:33:58 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
         1 12-SEP-18 10.33.56.000000000 AM   TRUE     
                                          

I set it to FALSE:



10:33:59 SQL> exec dbms_stats.set_table_prefs('DEMO','DEMO','publish','false');

PL/SQL procedure successfully completed.

10:34:00 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
         1 12-SEP-18 10.33.56.000000000 AM   FALSE  
                                            

I’m now gathering stats as I want to:



10:34:01 SQL> exec dbms_stats.gather_table_stats('DEMO','DEMO');
PL/SQL procedure successfully completed.

Test Pending Statistics

They are not published. But to test my queries with those new stats, I can set my session to use pending statistics:



10:34:02 SQL> alter session set optimizer_use_pending_statistics=true;
Session altered.

Running my query again, I can see the good estimations (E-Rows=A-Rows)



10:34:03 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 

10:34:04 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |     42 |     42 |   
--------------------------------------------------------------   

The published statistics still show 1 row:



10:34:05 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
         1 12-SEP-18 10.33.56.000000000 AM   FALSE            
                                  

But I can query the pending ones before publishing them:



10:34:05 SQL> c/dba_tab_statistics/dba_tab_pending_stats
  1* select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_pending_stats where owner='DEMO' and table_name in ('DEMO');
10:34:05 SQL> /

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
        42 12-SEP-18 10.34.01.000000000 AM   FALSE          
                                    

I’ve finished my test with pending statistics:



10:34:05 SQL> alter session set optimizer_use_pending_statistics=false;
Session altered.

Note that if you have Real Application Testing, you can use SQL Performance Analyzer to test the pending statistics on a whole SQL Tuning Set representing the critical queries of your application. Of course, the more you test there, the better it is.

Delete Pending Statistics

Now let’s say that my test shows that the new statistics are not good, I can simply delete the pending statistics:



10:34:06 SQL> exec dbms_stats.delete_pending_stats('DEMO','DEMO');
PL/SQL procedure successfully completed.

Then all queries are still using the previous statistics:



10:34:07 SQL> show parameter pending
NAME                             TYPE    VALUE
-------------------------------- ------- -----
optimizer_use_pending_statistics boolean FALSE

10:34:07 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 

10:34:08 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |      1 |     42 |   
--------------------------------------------------------------   

Accept Pending Statistics

Now I’ll show the second case where my tests show that the new statistics gathering is ok. I gather statistics again:



10:34:09 SQL> exec dbms_stats.gather_table_stats('DEMO','DEMO');
PL/SQL procedure successfully completed.

10:34:09 SQL> alter session set optimizer_use_pending_statistics=true;
Session altered.

10:34:11 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 


10:34:12 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |     42 |     42 |   
--------------------------------------------------------------   
                                                                 
10:34:12 SQL> alter session set optimizer_use_pending_statistics=false;
Session altered.

When I’m ok with the new statistics I can publish them so that other sessions can see them. As doing this in production is probably a fix for a critical problem, I want the effects to take immediately, invalidating all cursors:



10:34:13 SQL> exec dbms_stats.publish_pending_stats('DEMO','DEMO',no_invalidate=>false);
PL/SQL procedure successfully completed.

The default NO_INVALIDATE value is probably to avoid in those cases because you want to see the side effects, if any, as soon as possible. Not within a random window of 5 hours later where you have left the office. I set back the table preference to PUBLISH=TRUE and check that the new statistics are visible in DBA_TAB_STATISTICS (and no more in DBA_TAB_PENDING_STATS):



10:34:14 SQL> exec dbms_stats.set_table_prefs('DEMO','DEMO','publish','true');
PL/SQL procedure successfully completed.

10:34:15 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
        42 12-SEP-18 10.34.09.000000000 AM   TRUE                                               


10:34:15 SQL> c/dba_tab_statistics/dba_tab_pending_stats
  1* select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_pending_stats where owner='DEMO' and table_name in ('DEMO');
10:34:15 SQL> /

no rows selected

Report Differences

Then what if a citical regression is observed later? I still have the possibility to revert to the old statistics. First I can check in detail what has changed:



10:34:16 SQL> select report from table(dbms_stats.diff_table_stats_in_history('DEMO','DEMO',sysdate-1,sysdate,0));

REPORT
------

###############################################################################

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : DEMO
OWNER         : DEMO
SOURCE A      : Statistics as of 11-SEP-18 10.34.16.000000 AM EUROPE/ZURICH
SOURCE B      : Statistics as of 12-SEP-18 10.34.16.000000 AM EUROPE/ZURICH
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

DEMO                        T   A   1          4          3          1
                                B   42         8          3          42
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

N               A   1       1          NO   0       3    C102  C102  1
                B   41      .024390243 NO   0       3    C102  C12A  42
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS
###############################################################################

Restore Previous Statistics

If nothing is obvious and the regression is more critical than the original problem, I still have the possibility to revert back to the old statistics:



10:34:17 SQL> exec dbms_stats.restore_table_stats('DEMO','DEMO',sysdate-1,no_invalidate=>false);
PL/SQL procedure successfully completed.

Again, invalidating all cursors immediately is probably required as I solve a critical problem here. Immediately, the same query uses the old statistics:



10:34:17 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 


10:34:17 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |      1 |     42 |
--------------------------------------------------------------   

If I want to see what happened recently on this table, I can query the history of operations (you can replace my ugly regexp_replace with XQuery):



10:34:18 SQL> select end_time,end_time-start_time,operation,target,regexp_replace(regexp_replace(notes,'" val="','=>'),'(||)',' '),status from DBA_OPTSTAT_OPERATIONS where regexp_like(target,'"?'||'DEMO'||'"?."?'||'DEMO'||'"?') order by end_time desc fetch first 10 rows only;

END_TIME                                 END_TIME-START_TIME   OPERATION             TARGET          REGEXP_REPLACE(REGEXP_REPLACE(NOTES,'"VAL="','=>'),'(||)','')                                                                                                                                                                                                                                         STATUS      
--------                                 -------------------   ---------             ------          ----------------------------------------------------------------------------------------------                                                                                                                                                                                                                                         ------      
12-SEP-18 10.34.17.718800000 AM +02:00   +00 00:00:00.017215   restore_table_stats   "DEMO"."DEMO"     as_of_timestamp=>09-11-2018 10:34:17  force=>FALSE  no_invalidate=>FALSE  ownname=>DEMO  restore_cluster_index=>FALSE  tabname=>DEMO                                                                                                                                                                                                 COMPLETED   
12-SEP-18 10.34.13.262234000 AM +02:00   +00 00:00:00.010021   restore_table_stats   "DEMO"."DEMO"     as_of_timestamp=>11-30-3000 01:00:00  force=>FALSE  no_invalidate=>FALSE  ownname=>DEMO  restore_cluster_index=>FALSE  tabname=>DEMO                                                                                                                                                                                                 COMPLETED   
12-SEP-18 10.34.09.974873000 AM +02:00   +00 00:00:00.032513   gather_table_stats    "DEMO"."DEMO"     block_sample=>FALSE  cascade=>NULL  concurrent=>FALSE  degree=>NULL  estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE  force=>FALSE  granularity=>AUTO  method_opt=>FOR ALL COLUMNS SIZE AUTO  no_invalidate=>NULL  ownname=>DEMO  partname=>  reporting_mode=>FALSE  statid=>  statown=>  stattab=>  stattype=>DATA  tabname=>DEMO     COMPLETED   
12-SEP-18 10.34.01.194735000 AM +02:00   +00 00:00:00.052087   gather_table_stats    "DEMO"."DEMO"     block_sample=>FALSE  cascade=>NULL  concurrent=>FALSE  degree=>NULL  estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE  force=>FALSE  granularity=>AUTO  method_opt=>FOR ALL COLUMNS SIZE AUTO  no_invalidate=>NULL  ownname=>DEMO  partname=>  reporting_mode=>FALSE  statid=>  statown=>  stattab=>  stattype=>DATA  tabname=>DEMO     COMPLETED   

We can see here that the publishing of pending stats was actually a restore of stats as of Nov 30th of Year 3000. This is probably because the pending status is hardcoded as a date in the future. Does that mean that all pending stats will become autonomously published at that time? I don’t think we have to worry about Y3K bugs for the moment…

Here is the full receipe I’ve given to an application owner who needs to gather statistics on his tables on a highly critical database. Then he has all the info to limit the risks. My recommendation is to prepare this fallback scenario before doing any change, and test it as I did, on a test environment, in order to be ready to react on any unexpected side effect. Be careful, the pending statsitics do not work correctly with system statistics and can have very nasty side effects (Bug 21326597), but restoring from history is possible.

Cet article Optimizer Statistics Gathering – pending and history est apparu en premier sur Blog dbi services.

Building a network bonding between 2 cards on Oracle Linux

Mon, 2020-12-28 10:57

I recently needed to configure bonding between 2 network cards on a customer side and I wanted trough this blog to share my findings and how I built it showing some traces. I will also do a short comparison of what is possible or not on the ODA.

Why should I use bonding?

Bonding is a technology which will allow you to merge several network interfaces, either ports of the same cards or ports from separated network cards, into a same logical interface. Purposes would be to have some network redundancy in case of network failure, called fault tolerance, or to increase the network throughput (bandwidth), called load balancing.

What bonding mode should I use?

There are 7 bonding modes available to achieve these purposes. All bonding modes will guarantee fault tolerance. Some bonding modes will have load balancing functionnalities. For bonding mode 4 the switch will need to support links aggregation (EtherChannel). Link aggregation can be configured manually on the switch or automatically using LACP protocol (dynamic links aggregation).

Mode Description Fault tolerance Load balancing 0 Round-Robin Packets are sequentially transmitted and received through each interfaces one by one. YES YES 1 Active-backup Only one interface will be the active one. The other interface from the bonding configuration will be configured as backup. If the active interface will be in failure one of the backup interface will become the active one. The MAC address will only be visible on one port at the same time to avoid any confusion for the switch. YES NO 2 Balance-xor Peer connections are matched with MAC addresses of the slave interfaces. Once the connection is established the transmission of the peers is always sent over the same slave interface.

YES YES 3 Broadcast All network transmissions are sent on all slaves. YES NO 4 802.3ad – Dynamic Link Aggregation This mode will aggregate all interfaces from the bonding into a logical one. The traffic is sent and received on all slaves from the aggregation. The switch needs to support LACP and LACP needs to be activated. YES YES 5 TLB – Transmit Load Balancing The outgoing traffic is distributed between all interfaces depending of the current load of each slave interface. Incoming traffic is received by the current active slave. In case the active interface fails, another slave will take over the MAC address of the failed interface. YES YES 6 ALB – Adaptive Load Balancing This mode includes TLB (Transmit Load Balancing) and will use RLB (Receive Load Balancing) as well. The load balanced for the received packets will be done through ARP (Address Resolution Protocol) negotiation. YES YES

In my case, our customer wanted to guarantee the service in case of one network card failure only. No load balancing. The switch was not configured to use LACP. I then decided to configure the bonding in active-backup mode, which will guarantee redundancy only.

Bonding configuration Checking existing connection

The server is composed of 2 network cards having each of the card 4 interfaces (ports).
Card 1 : em1, em2, em3, em4
Card 2 : p4p1, p4p2, p4p3, p4p4

There is no bonding currently existing as shown in below output.

[root@SRV ~]# nmcli connection
NAME  UUID                                  TYPE      DEVICE
p4p1  d3cdc8f5-2d80-433d-9502-3b357c57f307  ethernet  p4p1
em1   f412b74b-2160-4914-b716-88f6b4d58c1f  ethernet  --
em2   0ab78e63-bde7-4c77-b455-7dcb1d5c6813  ethernet  --
em3   d6569615-322f-477b-9693-b42ee3dbe21e  ethernet  --
em4   52949f94-52d1-463e-ba32-06c272c07ce0  ethernet  --
p4p2  12f01c70-4aab-42db-b0e8-b5422e43c1b9  ethernet  --
p4p3  0db2f5b9-d968-44cb-a042-cff20f112ed4  ethernet  --
p4p4  a2a0ebc4-ca74-452e-94ba-6d5fedbfdf28  ethernet  --


Checking existing configuration

The server was configured only with one IP address on the p4p1 network interface.

[root@SRV network-scripts]# pwd
/etc/sysconfig/network-scripts

[root@SRV network-scripts]# ls -l ifcfg*
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em1
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em2
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em3
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em4
-rw-r--r--. 1 root root 254 Aug 19  2019 ifcfg-lo
-rw-r--r--. 1 root root 378 Sep 21 17:09 ifcfg-p4p1
-rw-r--r--. 1 root root 277 Sep 21 17:09 ifcfg-p4p2
-rw-r--r--. 1 root root 277 Sep 21 17:09 ifcfg-p4p3
-rw-r--r--. 1 root root 277 Sep 21 17:09 ifcfg-p4p4

[root@SRV network-scripts]# more ifcfg-p4p1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=p4p1
UUID=d3cdc8f5-2d80-433d-9502-3b357c57f307
DEVICE=p4p1
ONBOOT=yes
IPADDR=192.168.1.180
PREFIX=24
GATEWAY=192.168.1.1
DNS1=192.168.1.5
DOMAIN=domain.com
IPV6_PRIVACY=no


Creating the bonding

Purpose is to create a bonding between the 2 network cards for fault tolerance. The bonding will then be composed of the slave interfaces p4p1 and em1.
The bonding mode selected will be the mode 1 (active-backup).

[root@SRV network-scripts]# nmcli con add type bond con-name bond1 ifname bond1 mode active-backup ip4 192.168.1.180/24
Connection 'bond1' (7b736616-f72d-46b7-b4eb-01468639889b) successfully added.

[root@SRV network-scripts]# nmcli conn
NAME   UUID                                  TYPE      DEVICE
p4p1   d3cdc8f5-2d80-433d-9502-3b357c57f307  ethernet  p4p1
bond1  7b736616-f72d-46b7-b4eb-01468639889b  bond      bond1
em1    f412b74b-2160-4914-b716-88f6b4d58c1f  ethernet  --
em2    0ab78e63-bde7-4c77-b455-7dcb1d5c6813  ethernet  --
em3    d6569615-322f-477b-9693-b42ee3dbe21e  ethernet  --
em4    52949f94-52d1-463e-ba32-06c272c07ce0  ethernet  --
p4p2   12f01c70-4aab-42db-b0e8-b5422e43c1b9  ethernet  --
p4p3   0db2f5b9-d968-44cb-a042-cff20f112ed4  ethernet  --
p4p4   a2a0ebc4-ca74-452e-94ba-6d5fedbfdf28  ethernet  --


Updating the bonding with appropriate gateway, dns and domain information
[root@SRV network-scripts]# cat ifcfg-bond1
BONDING_OPTS=mode=active-backup
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=192.168.1.180
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond1
UUID=7b736616-f72d-46b7-b4eb-01468639889b
DEVICE=bond1
ONBOOT=yes

[root@SRV network-scripts]# vi ifcfg-bond1

[root@SRV network-scripts]# cat ifcfg-bond1
BONDING_OPTS=mode=active-backup
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=192.168.1.180
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond1
UUID=7b736616-f72d-46b7-b4eb-01468639889b
DEVICE=bond1
ONBOOT=yes
GATEWAY=192.168.1.1
DNS1=192.168.1.5
DOMAIN=domain.com


Adding slave interface em1 in the bonding bond1

Each slaves needs to be added to the master bonding.

We will first delete existing em1 slave :

[root@SRV network-scripts]# nmcli con delete em1
Connection 'em1' (f412b74b-2160-4914-b716-88f6b4d58c1f) successfully deleted.


We will then create new em1 interface part of the bond1 bonding configuration :

[root@SRV network-scripts]# nmcli con add type bond-slave ifname em1 con-name em1 master bond1
Connection 'em1' (8c72c383-e1e9-4e4b-ac2f-3d3d81d5159b) successfully added.


And we can check the interfaces :

[root@SRV network-scripts]# nmcli con
NAME   UUID                                  TYPE      DEVICE
p4p1   d3cdc8f5-2d80-433d-9502-3b357c57f307  ethernet  p4p1
bond1  7b736616-f72d-46b7-b4eb-01468639889b  bond      bond1
em1    8c72c383-e1e9-4e4b-ac2f-3d3d81d5159b  ethernet  em1
em2    0ab78e63-bde7-4c77-b455-7dcb1d5c6813  ethernet  --
em3    d6569615-322f-477b-9693-b42ee3dbe21e  ethernet  --
em4    52949f94-52d1-463e-ba32-06c272c07ce0  ethernet  --
p4p2   12f01c70-4aab-42db-b0e8-b5422e43c1b9  ethernet  --
p4p3   0db2f5b9-d968-44cb-a042-cff20f112ed4  ethernet  --
p4p4   a2a0ebc4-ca74-452e-94ba-6d5fedbfdf28  ethernet  --


Activating the bonding

We need to first activate the first configured slaves :

[root@SRV network-scripts]# nmcli con up em1
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4)


We can now activate the bonding :

[root@SRV network-scripts]# nmcli con up bond1
Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5)


We can check the connections :

[root@SRV network-scripts]# nmcli con
NAME   UUID                                  TYPE      DEVICE
p4p1   d3cdc8f5-2d80-433d-9502-3b357c57f307  ethernet  p4p1
bond1  7b736616-f72d-46b7-b4eb-01468639889b  bond      bond1
em1    8c72c383-e1e9-4e4b-ac2f-3d3d81d5159b  ethernet  em1
em2    0ab78e63-bde7-4c77-b455-7dcb1d5c6813  ethernet  --
em3    d6569615-322f-477b-9693-b42ee3dbe21e  ethernet  --
em4    52949f94-52d1-463e-ba32-06c272c07ce0  ethernet  --
p4p2   12f01c70-4aab-42db-b0e8-b5422e43c1b9  ethernet  --
p4p3   0db2f5b9-d968-44cb-a042-cff20f112ed4  ethernet  --
p4p4   a2a0ebc4-ca74-452e-94ba-6d5fedbfdf28  ethernet  --


Adding slave interface p4p1 in the bonding bond1

We will first delete existing p4p1 slave :

[root@SRV network-scripts]# nmcli con delete p4p1
Connection 'p4p1' (d3cdc8f5-2d80-433d-9502-3b357c57f307) successfully deleted.

[root@SRV network-scripts]# nmcli con
NAME   UUID                                  TYPE      DEVICE
bond1  7b736616-f72d-46b7-b4eb-01468639889b  bond      bond1
em1    8c72c383-e1e9-4e4b-ac2f-3d3d81d5159b  ethernet  em1
em2    0ab78e63-bde7-4c77-b455-7dcb1d5c6813  ethernet  --
em3    d6569615-322f-477b-9693-b42ee3dbe21e  ethernet  --
em4    52949f94-52d1-463e-ba32-06c272c07ce0  ethernet  --
p4p2   12f01c70-4aab-42db-b0e8-b5422e43c1b9  ethernet  --
p4p3   0db2f5b9-d968-44cb-a042-cff20f112ed4  ethernet  --
p4p4   a2a0ebc4-ca74-452e-94ba-6d5fedbfdf28  ethernet  --


We will then create new p4p1 interface part of the bond1 bonding configuration :

[root@SRV network-scripts]# nmcli con add type bond-slave ifname p4p1 con-name p4p1 master bond1
Connection 'p4p1' (efef0972-4b3f-46a2-b054-ebd1aa201056) successfully added.


And we can check the interfaces :

[root@SRV network-scripts]# nmcli con
NAME   UUID                                  TYPE      DEVICE
bond1  7b736616-f72d-46b7-b4eb-01468639889b  bond      bond1
em1    8c72c383-e1e9-4e4b-ac2f-3d3d81d5159b  ethernet  em1
p4p1   efef0972-4b3f-46a2-b054-ebd1aa201056  ethernet  p4p1
em2    0ab78e63-bde7-4c77-b455-7dcb1d5c6813  ethernet  --
em3    d6569615-322f-477b-9693-b42ee3dbe21e  ethernet  --
em4    52949f94-52d1-463e-ba32-06c272c07ce0  ethernet  --
p4p2   12f01c70-4aab-42db-b0e8-b5422e43c1b9  ethernet  --
p4p3   0db2f5b9-d968-44cb-a042-cff20f112ed4  ethernet  --
p4p4   a2a0ebc4-ca74-452e-94ba-6d5fedbfdf28  ethernet  --


Activating the new p4p1 slave interface

We can now activate the next recently added slaves :

[root@SRV network-scripts]# nmcli con up p4p1
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/11)


Restart the network service

We will restart the network service to have the new bonding configuration taking into account :

[root@SRV network-scripts]# service network restart
Restarting network (via systemctl):                        [  OK  ]


We can check the IP configuration :

[root@SRV network-scripts]# ip addr sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
3: em3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4e brd ff:ff:ff:ff:ff:ff
4: em2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:51 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4f brd ff:ff:ff:ff:ff:ff
6: p4p1:  mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
7: p4p2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:31 brd ff:ff:ff:ff:ff:ff
8: p4p3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:32 brd ff:ff:ff:ff:ff:ff
9: p4p4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:33 brd ff:ff:ff:ff:ff:ff
11: bond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.180/24 brd 192.168.1.255 scope global noprefixroute bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::b4f9:e44d:25fc:3a6/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


Check IP configuration files

We are now having our bond ifcfg configuration file :

[root@SRV ~]# cd /etc/sysconfig/network-scripts

[root@SRV network-scripts]# pwd
/etc/sysconfig/network-scripts

[root@SRV network-scripts]# ls -ltrh ifcfg*
-rw-r--r--. 1 root root 254 Aug 19  2019 ifcfg-lo
-rw-r--r--. 1 root root 277 Sep 21 17:09 ifcfg-p4p4
-rw-r--r--. 1 root root 277 Sep 21 17:09 ifcfg-p4p2
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em4
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em3
-rw-r--r--. 1 root root 277 Sep 21 17:09 ifcfg-p4p3
-rw-r--r--. 1 root root 275 Sep 21 17:09 ifcfg-em2
-rw-r--r--. 1 root root 411 Oct  7 16:45 ifcfg-bond1
-rw-r--r--. 1 root root 110 Oct  7 16:46 ifcfg-em1
-rw-r--r--. 1 root root 112 Oct  7 16:50 ifcfg-p4p1


The bonding file will have the IP configuration :

[root@SRV network-scripts]# cat ifcfg-bond1
BONDING_OPTS=mode=active-backup
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=192.168.1.180
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond1
UUID=7b736616-f72d-46b7-b4eb-01468639889b
DEVICE=bond1
ONBOOT=yes
GATEWAY=192.168.1.1
DNS1=192.168.1.5
DOMAIN=domain.com


p4p1 interface will be one of the bond1 slave :

[root@SRV network-scripts]# cat ifcfg-p4p1
TYPE=Ethernet
NAME=p4p1
UUID=efef0972-4b3f-46a2-b054-ebd1aa201056
DEVICE=p4p1
ONBOOT=yes
MASTER=bond1
SLAVE=yes


em1 interface from the other physical network card will be the next bond1 slave :

[root@SRV network-scripts]# cat ifcfg-em1
TYPE=Ethernet
NAME=em1
UUID=8c72c383-e1e9-4e4b-ac2f-3d3d81d5159b
DEVICE=em1
ONBOOT=yes
MASTER=bond1
SLAVE=yes


Check bonding interfaces and mode
[root@SRV network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: em1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: bc:97:e1:5b:e4:50
Slave queue ID: 0

Slave Interface: p4p1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 3c:fd:fe:85:0d:30
Slave queue ID: 0
[root@SRV network-scripts]#


Test the bonding

Both network cables are plugged into em1 and p4p1. Both interfaces are UP. :

[root@SRV network-scripts]# ip addr sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
3: em3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4e brd ff:ff:ff:ff:ff:ff
4: em2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:51 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4f brd ff:ff:ff:ff:ff:ff
6: p4p1:  mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
7: p4p2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:31 brd ff:ff:ff:ff:ff:ff
8: p4p3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:32 brd ff:ff:ff:ff:ff:ff
9: p4p4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:33 brd ff:ff:ff:ff:ff:ff
15: bond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.180/24 brd 192.168.1.255 scope global noprefixroute bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::b4f9:e44d:25fc:3a6/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


Pinging the server is OK :

[ansible@linux-ansible / ]$ ping 192.168.1.180
PING 192.168.1.180 (192.168.1.180) 56(84) bytes of data.
64 bytes from 192.168.1.180: icmp_seq=1 ttl=64 time=0.206 ms
64 bytes from 192.168.1.180: icmp_seq=2 ttl=64 time=0.290 ms
64 bytes from 192.168.1.180: icmp_seq=3 ttl=64 time=0.152 ms
64 bytes from 192.168.1.180: icmp_seq=4 ttl=64 time=0.243 ms


I have plug out the cable from the em1 interface. We can see em1 interface DOWN and p4p1 interface UP :

[root@SRV network-scripts]# ip addr sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq master bond1 state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
3: em3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4e brd ff:ff:ff:ff:ff:ff
4: em2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:51 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4f brd ff:ff:ff:ff:ff:ff
6: p4p1:  mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
7: p4p2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:31 brd ff:ff:ff:ff:ff:ff
8: p4p3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:32 brd ff:ff:ff:ff:ff:ff
9: p4p4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:33 brd ff:ff:ff:ff:ff:ff
15: bond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.180/24 brd 192.168.1.255 scope global noprefixroute bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::b4f9:e44d:25fc:3a6/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


pinging the server is still OK :

[ansible@linux-ansible / ]$ ping 192.168.1.180
PING 192.168.1.180 (192.168.1.180) 56(84) bytes of data.
64 bytes from 192.168.1.180: icmp_seq=1 ttl=64 time=0.234 ms
64 bytes from 192.168.1.180: icmp_seq=2 ttl=64 time=0.256 ms
64 bytes from 192.168.1.180: icmp_seq=3 ttl=64 time=0.257 ms
64 bytes from 192.168.1.180: icmp_seq=4 ttl=64 time=0.245 ms


I have then plug in the cable in em1 interface again and plug out the cable from the p4p1 interface. We can see em1 interface now UP again and p4p1 interface DOWN :

[root@SRV network-scripts]# ip addr sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
3: em3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4e brd ff:ff:ff:ff:ff:ff
4: em2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:51 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:4f brd ff:ff:ff:ff:ff:ff
6: p4p1:  mtu 1500 qdisc mq master bond1 state DOWN group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
7: p4p2:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:31 brd ff:ff:ff:ff:ff:ff
8: p4p3:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:32 brd ff:ff:ff:ff:ff:ff
9: p4p4:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:fd:fe:85:0d:33 brd ff:ff:ff:ff:ff:ff
15: bond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:97:e1:5b:e4:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.180/24 brd 192.168.1.255 scope global noprefixroute bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::b4f9:e44d:25fc:3a6/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


pinging the server is still OK :

[ansible@linux-ansible / ]$ ping 192.168.1.180
PING 192.168.1.180 (192.168.1.180) 56(84) bytes of data.
64 bytes from 192.168.1.180: icmp_seq=1 ttl=64 time=0.159 ms
64 bytes from 192.168.1.180: icmp_seq=2 ttl=64 time=0.219 ms
64 bytes from 192.168.1.180: icmp_seq=3 ttl=64 time=0.362 ms
64 bytes from 192.168.1.180: icmp_seq=4 ttl=64 time=0.236 ms


And what about the ODA?

This configuration has been setup at one customer system running DELL servers. I have been deploying several ODAs by other customers and the questionning of having fault tolerance between several network cards is often coming. Unfortunately, and albeit the ODA are running Oracle Linux operation system, such configuration is not supported on the appliance. The Appliance will only support active-backup between ports of the same network cards. Additionnal network cards will be used on the ODA to have additionnal network connections. Last but not least, LACP is not supported on the appliance.

Cet article Building a network bonding between 2 cards on Oracle Linux est apparu en premier sur Blog dbi services.

Password rolling change before Oracle 21c

Sun, 2020-12-27 13:35
By Franck Pachot

.
You may have read about Gradual Password Rollover usage from Mouhamadou Diaw and about some internals from Rodrigo Jorge. But it works only on 21c which is only in the cloud, for the moment, in Autonomous Database and DBaaS (but here I’ve encountered some problems apparently because of a bug when using SQL*Net native encryption). But your production is not yet in 21c anyway. However, here is how you can achieve a similar goal in 12c,18c or 19c: be able to connect with two passwords for the time window where you are changing the password in a rolling fashion in the application server configuration.

Proxy User

If your application still connects with the application owner, you do it wrong. Even when it needs to be connected in the application schema by default, and even when you can’t to an “alter session set current_schema” you don’t have to use this user for authentication. And this is really easy with proxy users. Consider the application owner as a schema, not as a user to connect with.

My application is in schema DEMO and I’ll not use DEMO credentials. You can set an impossible password or, better, in 18c, set no password at all. I’ll use a proxy user authentication to connect to this DEMO user:


19:28:49 DEMO@atp1_tp> grant create session to APP2020 identified by "2020 was a really Bad Year!";
Grant succeeded.

19:28:50 DEMO@atp1_tp> alter user DEMO grant connect through APP2020;
User DEMO altered.

The APP2020 user is the one I’ll use. I named it 2020 because I want to change the credentials every year and, as I don’t have the gradual rollover password feature, this means changing the user to connect with.


19:28:50 DEMO@atp1_tp> connect APP2020/"2020 was a really Bad Year!"@atp1_tp
Connected.
19:28:52 APP2020@atp1_tp> show user
USER is "APP2020"

This user can connect as usual, as it has the CREATE SESSION privilege. There is a way to prevent this and allow PROXY ONLY CONNECT, but this is unfortunately not documented (Miguel Anjo has written about this) so better not using it.

However, the most important is:


19:28:52 APP2020@atp1_tp> connect APP2020[DEMO]/"2020 was a really Bad Year!"@atp1_tp
Connected.

19:28:53 DEMO@atp1_tp> show user
USER is "DEMO"

With proxy connection, in addition to the proxy user credentials I mention the final user I want to connect to, though this proxy user. Now I’m in the exact same state as if I connected with the DEMO user.

No authentication

19:28:54 ADMIN@atp1_tp> alter user DEMO no authentication;
User DEMO altered.

As we don’t connect through this user anymore (and once I’m sure no application uses it) the best is to set it with NO AUTHENTICATION.

New proxy user

Now that the application uses this APP2020 for months, I want to change the password. I’ll add a new proxy user for that:


19:28:54 ADMIN@atp1_tp> show user
USER is "ADMIN"

19:28:53 ADMIN@atp1_tp> grant create session to APP2021 identified by "Best Hopes for 2021 :)";
Grant succeeded.

19:28:54 ADMIN@atp1_tp> alter user DEMO grant connect through APP2021;
User DEMO altered.

Here I have another proxy user that can be used to connect to DEMO, in addition to the existing one


19:28:54 ADMIN@atp1_tp> connect APP2020[DEMO]/"2020 was a really Bad Year!"@atp1_tp
Connected.

19:28:55 DEMO@atp1_tp> show user
USER is "DEMO"

19:28:55 DEMO@atp1_tp> connect APP2021[DEMO]/"Best Hopes for 2021 :)"@atp1_tp
Connected.

19:28:56 DEMO@atp1_tp> show user
USER is "DEMO"

During this time, I can use both credentials. This gives me enough time to change all application server configuration one by one, without any downtime for the application.

Lock previous account

19:30:00 ADMIN@atp1_tp> 
 select username,account_status,last_login,password_change_date,proxy_only_connect 
 from dba_users where username like 'APP____';

   USERNAME    ACCOUNT_STATUS                                       LAST_LOGIN    PASSWORD_CHANGE_DATE    PROXY_ONLY_CONNECT
___________ _________________ ________________________________________________ _______________________ _____________________
APP2020     OPEN              27-DEC-20 07.28.55.000000000 PM EUROPE/ZURICH    27-DEC-20               N
APP2021     OPEN              27-DEC-20 07.28.56.000000000 PM EUROPE/ZURICH    27-DEC-20               N

After a while, I can validate that the old user is not used anymore. If you have a connection recycling duration in the connection pool (you should) you can rely on last login.


19:30:00 ADMIN@atp1_tp> alter user APP2020 account lock;
User APP2020 altered.

Before dropping it, just lock the account, easier to keep track of it and unlock it quickly if anyone encounters a problem


19:30:00 ADMIN@atp1_tp> connect APP2020[DEMO]/"2020 was a really Bad Year!"@atp1_tp
Error starting at line : 30 File @ /home/opc/demo/tmp/proxy_to_rollover.sql
In command -
  connect ...
Error report -
Connection Failed
  USER          = APP2020[DEMO]
  URL           = jdbc:oracle:thin:@atp1_tp
  Error Message = ORA-28000: The account is locked.
Commit

If someone tries to connect with the old password, he will know that the user is locked.


19:30:01 @> connect APP2021[DEMO]/"Best Hopes for 2021 :)"@atp1_tp
Connected.
19:30:02 DEMO@atp1_tp> show user
USER is "DEMO"

Once the old user locked, only the new one is able to connect, with the new user credentials. As this operation can be done with no application downtime, you can do it frequently. From a security point of view, you must change passwords frequently. For end-user passwords, you can set a lifetime, and grace period. But not for system users as the warning may not be cached. Better change them proactively.

Cet article Password rolling change before Oracle 21c est apparu en premier sur Blog dbi services.

Cluster level encryption for PostgreSQL 14

Sat, 2020-12-26 09:33

The discussions how and why TDE (Transparent data encryption) should be implemented in PostgreSQL goes back several years. You can have a look at these two more recent threads to get an idea on how much discussion happened around that feature:

Finally an essentials part for that infrastructure was committed and I am sure, many people have waited for that to appear in plain community PostgreSQL. Lets have a quick look how it works and if it easy to play with.

To get an encrypted cluster you need to specify that when you initialize the cluster with initdb. One additional requirement is, that PostgreSQL was compiled with “–with-openssl”:

postgres@debian10pg:/home/postgres/ [pgdev] pg_config | grep openssl
CONFIGURE =  '--prefix=/u01/app/postgres/product/DEV/db_1/' '--exec-prefix=/u01/app/postgres/product/DEV/db_1/' '--bindir=/u01/app/postgres/product/DEV/db_1//bin' '--libdir=/u01/app/postgres/product/DEV/db_1//lib' '--sysconfdir=/u01/app/postgres/product/DEV/db_1//etc' '--includedir=/u01/app/postgres/product/DEV/db_1//include' '--datarootdir=/u01/app/postgres/product/DEV/db_1//share' '--datadir=/u01/app/postgres/product/DEV/db_1//share' '--with-pgport=5432' '--with-perl' '--with-python' '--with-openssl' '--with-pam' '--with-ldap' '--with-libxml' '--with-libxslt' '--with-segsize=2' '--with-blocksize=8' '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-7' '--with-systemd'

If that is given you can initialize a new cluster and tell initdb how to get the encryption key:

postgres@debian10pg:/home/postgres/ [pgdev] initdb --help | grep cluster-key-command
  -c  --cluster-key-command=COMMAND

If this key is provided, two internal keys are generated, one for the table and index files (and any temporary objects) and one for the WAL files:

postgres@debian10pg:/home/postgres/ [pgdev] initdb --pgdata=/var/tmp/pgenc --cluster-key-command=/home/postgres/get_key.sh
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.
Cluster file encryption is enabled.

creating directory /var/tmp/pgenc ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Zurich
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/tmp/pgenc -l logfile start

The command to get the key in this example is quite trivial:

postgres@debian10pg:/home/postgres/ [pgdev] cat /home/postgres/get_key.sh
echo "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

In a real setup the key should of course come from an external key store. Lets try to start the cluster:

postgres@debian10pg:/home/postgres/ [pgdev] export PGPORT=8888
postgres@debian10pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pgenc/ start
waiting for server to start....2020-12-26 16:11:12.220 CET [7106] LOG:  starting PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-12-26 16:11:12.221 CET [7106] LOG:  listening on IPv6 address "::1", port 8888
2020-12-26 16:11:12.221 CET [7106] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2020-12-26 16:11:12.234 CET [7106] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2020-12-26 16:11:12.250 CET [7109] LOG:  database system was shut down at 2020-12-26 16:08:34 CET
2020-12-26 16:11:12.274 CET [7106] LOG:  database system is ready to accept connections
 done
server started

Why does that work? We did not provide the key at startup time so PostgreSQL somehow must know how to get the key. Actually there is a new parameter that automatically gets the command we specified when we initialized the cluster:

postgres@debian10pg:/home/postgres/ [pgdev] grep cluster_key /var/tmp/pgenc/postgresql.conf 
cluster_key_command = '/home/postgres/get_key.sh'

If we remove that and start again it will not work:

postgres@debian10pg:/home/postgres/ [pgdev] psql -c "alter system set cluster_key_command=''" postgres
ALTER SYSTEM
postgres@debian10pg:/home/postgres/ [pgdev] grep cluster_key /var/tmp/pgenc/postgresql.auto.conf 
cluster_key_command = ''
postgres@debian10pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pgenc/ stop
2020-12-26 16:15:29.457 CET [7106] LOG:  received fast shutdown request
waiting for server to shut down....2020-12-26 16:15:29.467 CET [7106] LOG:  aborting any active transactions
2020-12-26 16:15:29.469 CET [7106] LOG:  background worker "logical replication launcher" (PID 7115) exited with exit code 1
2020-12-26 16:15:29.473 CET [7110] LOG:  shutting down
2020-12-26 16:15:29.534 CET [7106] LOG:  database system is shut down
 done
server stopped
16:15:29 postgres@debian10pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pgenc/ start
waiting for server to start....2020-12-26 16:15:31.762 CET [7197] LOG:  starting PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-12-26 16:15:31.763 CET [7197] LOG:  listening on IPv6 address "::1", port 8888
2020-12-26 16:15:31.763 CET [7197] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2020-12-26 16:15:31.778 CET [7197] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2020-12-26 16:15:31.786 CET [7197] FATAL:  cluster key must be 64 hexadecimal characters
2020-12-26 16:15:31.787 CET [7197] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

The two keys that have been generated when the cluster was initialized can be found in $PGDATA:

postgres@debian10pg:/var/tmp/pgenc/ [pgdev] ls -la pg_cryptokeys/live/
total 16
drwx------ 2 postgres postgres 4096 Dec 26 16:08 .
drwx------ 3 postgres postgres 4096 Dec 26 16:08 ..
-rw------- 1 postgres postgres   72 Dec 26 16:08 0
-rw------- 1 postgres postgres   72 Dec 26 16:08 1

The reason for two separate keys is, that a primary and a replica cluster can have a different key for the table, index and all other files generated during database operations but still can have the same key for the WAL files. Btw: pg_controldata will also tell you if a cluster is encrypted:

postgres@debian10pg:/var/tmp/pgenc/base/12833/ [pgdev] pg_controldata -D /var/tmp/pgenc/ | grep encr
File encryption key length:           128

That really is a nice and much appreciated feature. Currently only the whole cluster can be encrypted, but I am sure that is sufficient for most of the use cases. Lets hope that it will not get reverted for any reason.

Cet article Cluster level encryption for PostgreSQL 14 est apparu en premier sur Blog dbi services.

Running two Patroni on one host using an existing etcd

Wed, 2020-12-23 07:00

Have you ever asked yourself, how to create a second Patroni PostgreSQL cluster on an existing server using the existing etcd? My first idea was to study the documentation of Patroni, but unfortunately without big success. This post should help to identify the changes you have to do on the hosts to run two parallel Patroni clusters using an existing etcd.

Starting Point

First we want to have a short look on the existing infrastructure to have a better overview where we are starting.

There is a Patroni installation and etcd already existing on the servers. As well as one PostgreSQL cluster streaming from primary to replica. We are not using two replicas in this example, but it works the same for numerous replicas.

etcd

As the etcd is already running on the hosts, let’s start with this one. And here we already have good news! You don’t have to change anything on etcd side. Just leave your configuration as it is.

postgres@postgres_primary:/home/postgres/.local/bin/ [PGTEST] cat /u01/app/postgres/local/dmk/dmk_postgres/etc/etcd.conf
name: postgres-primary
data-dir: /u02/postgres/pgdata/etcd
initial-advertise-peer-urls: http://192.168.22.33:2380
listen-peer-urls: http://192.168.22.33:2380
listen-client-urls: http://192.168.22.33:2379,http://localhost:2379
advertise-client-urls: http://192.168.22.33:2379
initial-cluster: postgres-primary=http://192.168.22.33:2380, postgres-stby=http://192.168.22.34:2380, postgres-stby2=http://192.168.22.35:2380
patroni.yml

Let’s go on with the patroni.yml. As there is already a Patroni running on that server you need to create another patroni.yml, let’s say patroni_pgtest.yml. To keep it simple and not reinventing the wheel, just copy your existing yml file

postgres@postgres_primary:/home/postgres/ [PGTEST] cd /u01/app/postgres/local/dmk/dmk_postgres/etc
postgres@postgres_primary:/u01/app/postgres/local/dmk/dmk_postgres/etc/ [PGTEST] cp patroni.yml patroni_pgtest.yml

Once we have the new patroni_pgtest.yml we need to adjust some entries in this file. Most important entries to change are “namespace” and “scope”. Without changing this, your new Patroni service won’t create a new PostgreSQL cluster

scope: PGTEST
namespace: /pgtest/
name: pgtest1

Next parameters to change are the restapi ones. You can keep the IP address, but you have to adjust the port. Otherwise the service will start with an: “Address already in use” error.

restapi:
  listen: 192.168.22.33:8009
  connect_address: 192.168.22.33:8009

Once this is done, of course the PostgreSQL parameters need to be adjusted to not use the same port and clustername as the already existing cluster. Further the PGDATA directory needs to be adjusted.

...
...
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        ...
        ...
        port: 5410
...
...
postgresql:
  listen: 192.168.22.33:5410
  connect_address: 192.168.22.33:5410
  data_dir: /u02/postgres/pgdata/13/PGTEST/
...
...
Patroni service

Now that we changed all our parameters, we can create a second Patroni service named patroni_pgtest.service. Be sure to point to the correct patroni_pgtest.yml

postgres@postgres_primary:/home/postgres/ [PGTEST] sudo vi /etc/systemd/system/patroni_pgtest.service
#
# systemd integration for patroni
#

[Unit]
Description=dbi services patroni service
After=etcd.service syslog.target network.target

[Service]
User=postgres
Type=simple
ExecStart=/u01/app/postgres/local/dmk/bin/patroni /u01/app/postgres/local/dmk/etc/patroni_pgtest.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
Restart=no

[Install]
WantedBy=multi-user.target

Now we can start and enable the service

postgres@postgres_primary:/home/postgres/ [PGTEST] sudo systemctl start patroni_pgtest.service
postgres@postgres_primary:/home/postgres/ [PGTEST] sudo systemctl status patroni_pgtest.service
● patroni_pgtest.service - dbi services patroni service
   Loaded: loaded (/etc/systemd/system/patroni_pgtest.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-12-22 20:07:46 CET; 9h ago
 Main PID: 4418 (patroni)
   CGroup: /system.slice/patroni.service
           ├─4418 /usr/bin/python2 /u01/app/postgres/local/dmk/dmk_postgres/bin/patroni /u01/app/postgres/local/dmk/dmk_postgres/etc/patroni_pgtest.yml
           ├─5258 /u01/app/postgres/product/PG13/db_1/bin/postgres -D /u02/pgdata/13/PG1/ --config-file=/u02/postgres/pgdata/13/PG1/postgresql.conf --listen_addresses=192.168.22.33 --max_worker_processes=8 --max_locks_per_tra...
           ├─5282 postgres: PG1: logger process
           ├─5292 postgres: PG1: checkpointer process
           ├─5294 postgres: PG1: writer process
           ├─5296 postgres: PG1: stats collector process
           ├─6171 postgres: PG1: postgres postgres 192.168.22.33(50492) idle
           ├─6473 postgres: PG1: wal writer process
           ├─6474 postgres: PG1: autovacuum launcher process

Dec 23 05:36:21 postgres_primary patroni[4418]: 2020-12-23 05:36:21,032 INFO: Lock owner: postgres_primary; I am postgres_primary
Dec 23 05:36:21 postgres_primary patroni[4418]: 2020-12-23 05:36:21,047 INFO: no action.  i am the leader with the lock

postgres@postgres_primary:/home/postgres/ [PGTEST] sudo systemctl enable patroni_pgtest.service

As the cluster is running on the primary server now, you can do the exactly same steps on your replica server(s). Be sure to set all ports and IPs correctly.

Conclusion

Even if it seems to be easy to setup a second Patroni on a server, it took some time to found out, what exactly needs to be changes. But once you know all that, it’s really simple. Just keep in mind that you have to use a port for your PostgreSQL cluster that is not used at the moment.
Furthermore if you are using our DMK on your host, be sure to use ‘patronictl list’ calling the correct configuration file and the complete path for patronictl. DMK gives you an alias for patronictl which will only work for the first Patroni cluster created on the server.

postgres@postgres_primary:/home/postgres/ [PGTEST] cd .local/bin
postgres@postgres_primary:/home/postgres/.local/bin [PGTEST] patronictl -c /u01/app/postgres/local/dmk/etc/patroni_pgtest.yml list
+------------+------------------+---------------+--------+---------+-----+-----------+
| Cluster    |     Member       |      Host     |  Role  |  State  |  TL | Lag in MB |
+------------+------------------+---------------+--------+---------+-----+-----------+
|   PGTEST   | postgres_primary | 192.168.22.33 | Leader | running | 528 |       0.0 |
|   PGTEST   | postgres_replica | 192.168.22.34 |        | running | 528 |       0.0 |
+------------+------------------+---------------+--------+---------+-----+-----------+

In case you’re not using DMK, you have to add the configuration file in any case. You also have to set the correct PATH variable or use the complete path to call patronictl.

Cet article Running two Patroni on one host using an existing etcd est apparu en premier sur Blog dbi services.

SQL Server TCP: Having both Dynamic Ports and Static Port configured

Mon, 2020-12-21 11:43
Introduction

Have you ever seen an SQL Server instance configured to listen on both “TCP Dynamic Ports” and “TCP (static) Port”?

This kind of configuration can be caused by the following scenario:

  1. A named instance is installed. By default, it is configured to use dynamic ports.
  2. Someone wants to configure the instance to listen to a fixed port and set the “TCP Port” value
  3. The “TCP Dynamic Ports” is set to value “0” thinking this would disable the dynamics ports

The documentation states that a value of “0” is actually enabling “TCP Dynamic Ports”.

If the TCP Dynamic Ports dialog box contains 0, indicating the Database Engine is listening on dynamic ports

After a service restart, SQL Server will listen to a port like 50119 for example.
You end up with the following configuration.

So what’s happening to SQL Server with this configuration?

What SQL Server is listening on?

Well, I could not find anything related to this particular case in the Microsoft documentation.
If we look at the SQL Server Error Log we can see that the instance is listening on both ports: the dynamically chosen one and the static port.

We can confirm this by trying a connection using SSMS:

What TCP port is used by clients connections?

But, are both ports actually used by client connections to the server?
From SQL we can see the established connections and their TCP port using this query:

select distinct local_tcp_port
from sys.dm_exec_connections
where net_transport = 'TCP'

This could also be seen with netstat:

Looking at this information I see no connection at all using the dynamically assigned port.
Only the static port is used.

SQL Browser

My guess is that the SQL Server Browser is giving priority to the static Port and always return this port to clients. I didn’t find any information online about this behavior but it makes sense.

When a client wants to connect to an instance using “server\instancename” an exchange is done with the server using the SQL Server Resolution Protocol using UDP. This is why you should enable UDP port 1434 in your Firewall if you need the SQL Browser.
For details about this protocol, you can read the specifications here.

Doing some tests with Wireshark and a UDP filter we can see the client asking about “inst1”, my instance name.

The server response contains some information about the instance with the most important one, the TCP Port, here the static port: 15001.

Conclusion

I think this configuration should be avoided because it doesn’t seem to add any benefits and could bring some confusion.
If you use a static TCP port for your instance, leave the “Dynamic TCP Port” blank.

 

Cet article SQL Server TCP: Having both Dynamic Ports and Static Port configured est apparu en premier sur Blog dbi services.

Oracle SPD status on two learning paths

Sat, 2020-12-19 10:57
By Franck Pachot

.
I have written a lot about SQL Plan Directives that appeared in 12c. They were used by default and, because of some side effects at the time of 12cR1 with legacy applications that were parsing too much, they have been disabled by default in 12cR2. Today, there are probably not used enough because of their bad reputation from those times. But for datawarehouses, they should be the default in my opinion.

There is a behaviour that surprised me initially and I though it was a bug but, after 5 years, the verdict is: expected behaviour (Bug 20311655 : SQL PLAN DIRECTIVE INVALIDATED BY STATISTICS FEEDBACK). The name of the bug is my fault: I initially though that the statistics feedback had been wrongly interpreted as HAS_STATS. But actually, this behaviour has nothing to do with it: it was visible here only because the re-optimization had triggered a new hard parse, which has changed the state. But any other query on similar predicates would have done the same.

And this is what I’m showing here: when the misestimate cannot be solved by extended statistics, the learning path of SQL Plan Directive have to go through this HAS_STATS state where misestimate will occur again. I’m mentioning the fact that extended statistics can help or not, and this is anticipated by the optimizer. For this reason, I’ve run two sets of examples: one with a predicate where no column group can help, and one where extended statistics can be created.

SQL> show parameter optimizer_adaptive
NAME                              TYPE    VALUE 
--------------------------------- ------- ----- 
optimizer_adaptive_plans boolean TRUE 
optimizer_adaptive_reporting_only boolean FALSE 
optimizer_adaptive_statistics boolean TRUE 

Since 12.2 the adaptive statistics are disabled by default: SQL Plan Directives are created but not used. This is fine for OLTP databases that are upgraded from previous versions. However, for data warehouse, analytic, ad-hoc queries, reporting, enabling adaptive statistics may help a lot when the static statistics are not sufficient to optimize complex queries.

SQL> alter session set optimizer_adaptive_statistics=true;

Session altered.

I’m enabling adaptive statistics for my session.

SQL> exec for r in (select directive_id from dba_sql_plan_dir_objects where owner=user) loop begin dbms_spd.drop_sql_plan_directive(r.directive_id); exception when others then raise; end; end loop;

I’m removing all SQL Plan Directives in my lab to build a reproducible test case.

SQL> create table DEMO pctfree 99 as select mod(rownum,2) a,mod(rownum,2) b,mod(rownum,2) c,mod(rownum,2) d from dual connect by level <=1000;

Table DEMO created.

This is my test table. Build on purpose with a special distribution of data: all rows with 0 or 1 on all columns.

SQL> alter session set statistics_level=all;

Session altered.

I’m profiling down to execution plan operation in order to see all execution statistics

SPD learning path {E}:
USABLE(NEW)->SUPERSEDED(HAS_STATS)->USABLE(PERMANENT)
SQL> select count(*) c1 from demo where a+b+c+d=0;

    C1 
______ 
   500 

Here is a query where dynamic sampling can help to get better statistics on selectivity but where no static statistic can help even on column group (extended statistics on expression is not considered for SQL Plan Directives even in 21c)

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                                PLAN_TABLE_OUTPUT 
_________________________________________________________________________________________________ 
SQL_ID  fjcbm5x4014mg, child number 0                                                             
-------------------------------------                                                             
select count(*) c1 from demo where a+b+c+d=0                                                      
                                                                                                  
Plan hash value: 2180342005                                                                       
                                                                                                  
----------------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |    
----------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.03 |     253 |    250 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.03 |     253 |    250 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |     10 |    500 |00:00:00.03 |     253 |    250 |    
----------------------------------------------------------------------------------------------    
                                                                                                  
Predicate Information (identified by operation id):                                               
---------------------------------------------------                                               
                                                                                                  
   2 - filter("A"+"B"+"C"+"D"=0)      

As expected the estimation (10 rows) is far from the actual number of rows (500). This statement is flagged for re-optimisation with cardinality feedback but I’m interested by different SQL statements here.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';


    STATE    INTERNAL_STATE                      SPD_TEXT 
_________ _________________ _____________________________ 
USABLE    NEW               {E(DEMO.DEMO)[A, B, C, D]}    

A SQL Plan Directive has been created to keep the information that equality predicates on columns A, B, C and D are misestimated. The directive is in internal state NEW. The visible state is USABLE which means that dynamic sampling will be used by queries with a similar predicate on those columns.

SQL> select count(*) c2 from demo where a+b+c+d=0;

    C2 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  5sg7b9jg6rj2k, child number 0                                                    
-------------------------------------                                                    
select count(*) c2 from demo where a+b+c+d=0                                             
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter("A"+"B"+"C"+"D"=0)                                                         
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement      

As expected, a different query (note that I changed the column alias C1 to C2 but anything can be different as long as there’s an equality predicate involving the same columns) has accurate estimations (E-Rows=A-Rows) because of dynamic sampling (dynamic statistics) thanks to the used SQL Plan Directive.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';

        STATE    INTERNAL_STATE                      SPD_TEXT 
_____________ _________________ _____________________________ 
SUPERSEDED    HAS_STATS         {E(DEMO.DEMO)[A, B, C, D]}    

This is the important part and initially, I thought it was a bug because SUPERSEDED means that the next query on similar columns will not do dynamic sampling anymore, and then will have bad estimations. HAS_STATS does not mean that we have correct testimations here but only that there is no additional static statistics that can help. Because the optimizer has detected an expression (“A”+”B”+”C”+”D”=0) and automatic statistics extensions do not consider expressions.

SQL> select count(*) c3 from demo where a+b+c+d=0;

    C3 
______ 
   500 


SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  62cf5zwt4rwgj, child number 0                                                    
-------------------------------------                                                    
select count(*) c3 from demo where a+b+c+d=0                                             
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |     10 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter("A"+"B"+"C"+"D"=0)    

We are still in the learning phase and as you can see, even if we know that there is a misestimate (SPD has been created), adaptive statistic tries to avoid dynamic sampling: no SPD used mentioned in the notes, and back to the misestimate of E-Rows=10.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';

    STATE    INTERNAL_STATE                      SPD_TEXT 
_________ _________________ _____________________________ 
USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}    

The HAS_STATS and the misestimate was temporary. Now that the optimizer has validated that with all possible static statistics available (HAS_STATS) we still have a misestimate, and then has passed the SPD status to PERMANENT: end of the learning phase, we will permanently do dynamic sampling for this kind of query.

SQL> select count(*) c4 from demo where a+b+c+d=0;

    C4 
______ 
   500 


SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  65ufgd70n61nh, child number 0                                                    
-------------------------------------                                                    
select count(*) c4 from demo where a+b+c+d=0                                             
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter("A"+"B"+"C"+"D"=0)                                                         
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement                                        
                                                            

Yes, it has an overhead at hard parse time, but that helps to get better estimations and then faster execution plans. The execution plan shows that dynamic sampling is done because id SPD usage.

SPD learning path {EC}:
USABLE(NEW)->USABLE(MISSING_STATS)->SUPERSEDED(HAS_STATS)

I’m now running a query where the misestimate can be avoided with additional statistics: column group statistics extension.

SQL> select count(*) c1 from demo where a=0 and b=0 and c=0 and d=0;

    C1 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  2x5j71630ua0z, child number 0                                                    
-------------------------------------                                                    
select count(*) c1 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |     63 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))   

I have a misestimate here (E-Rows much lower than E-Rows) because the optimizer doesn’t know the correlation between A,B,C and D.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';


    STATE    INTERNAL_STATE                       SPD_TEXT 
_________ _________________ ______________________________ 
USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
USABLE    NEW               {EC(DEMO.DEMO)[A, B, C, D]}    

I have now a new SQL Plan Directive and the difference with the previous one is that the equality predicate (E) is a simple column equality on each column (EC). From that, the optimizer knows that statistics extension on column group may help.

SQL> select count(*) c2 from demo where a=0 and b=0 and c=0 and d=0;

    C2 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');


                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  5sg8p03mmx7ca, child number 0                                                    
-------------------------------------                                                    
select count(*) c2 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))                                     
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement       

So, the NEW directive is a USABLE state: SPD is used to do some dynamic sampling, as it was with the previous example.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';

    STATE    INTERNAL_STATE                       SPD_TEXT 
_________ _________________ ______________________________ 
USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
USABLE    MISSING_STATS     {EC(DEMO.DEMO)[A, B, C, D]}    

Here we have an additional state during the learning phase because there’s something else that can be done: we are not in HAS_STATS because more stats can be gathered. We are in MISSING_STATS internal state. This is a USABLE state so that dynamic sampling continues until we gather statistics.

SQL> select count(*) c3 from demo where a=0 and b=0 and c=0 and d=0;

    C3 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  d8zyzh140xk0d, child number 0                                                    
-------------------------------------                                                    
select count(*) c3 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))                                     
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement       

That can continue for a long time, with SPD in USABLE state and dynamic sampling compensating the missing stats, but at the cost of additional work during hard parse time.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select created,state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING' order by last_used;

    CREATED     STATE    INTERNAL_STATE                       SPD_TEXT 
___________ _________ _________________ ______________________________ 
20:52:11    USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
20:52:16    USABLE    MISSING_STATS     {EC(DEMO.DEMO)[A, B, C, D]}    

The status will not change until statistics gathering occurs.

SQL> exec dbms_stats.set_table_prefs(user,'DEMO','AUTO_STAT_EXTENSIONS','on');

PL/SQL procedure successfully completed.

In the same idea as adaptive statistics not enabled by default, the automatic creation of statistics extension is not there by default. I enable it for this table only here, but, as many dbms_stats operations, you can do that at schema, database or global level. This is what I do here. Usually, you do it initially when creating the table, or simply at database level because it works in pair with adaptive statistics, but in this demo I waited to show that even if the decision of going to HAS_STATS or MISSING_STATS state depends on the possibility of extended stats creation, this is done without looking at the dbms_stats preference.

SQL> exec dbms_stats.gather_table_stats(user,'DEMO', options=>'gather auto');

PL/SQL procedure successfully completed.

Note that I’m gathering the statistics like the automatic job does: GATHER AUTO. As I did not change any rows, the table statistics are not stale but the new directive in MISSING_STATS tells DBMS_STATS that there’s a reason to re-gather the statistics.

And if you look at statistics extensions there, there’s a new statistics extension on (A,B,C,D) column group.Just look at USER_STAT_EXTENSIONS.

SQL> select count(*) c4 from demo where a=0 and b=0 and c=0 and d=0;

    C4 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  g08m3qrmw7mgn, child number 0                                                    
-------------------------------------                                                    
select count(*) c4 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))                                     
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement     

You may think that no dynamic sampling is needed anymore but the Adaptive Statistics mechanism is still in the learning phase: the SPD is still USABLE and the next parse will verify if MISSING_STATS can be superseded by HAS_STATS. This is what happened here.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select created,state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING' order by last_used;

    CREATED         STATE    INTERNAL_STATE                       SPD_TEXT 
___________ _____________ _________________ ______________________________ 
20:52:11    USABLE        PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
20:52:16    SUPERSEDED    HAS_STATS         {EC(DEMO.DEMO)[A, B, C, D]}    

Here, SUPERSEDED means no more dynamic sampling for predicates with simple column equality on A,B,C,D because it HAS_STATS.

In the past, I mean before 12c, I often recommended enabling dynamic sampling (with optimizer_dynamic_sampling >= 4) on datawarehouses, or sessions running complex ad-hoc queries for reporting. And no dynamic sampling, creating manual statistics extensions only when required, for OLTP where we can expect less complex queries and where hard parse time may be a problem.

Now, in the same idea, I’ll rather recommend setting adaptive statistics because it has a finer grain optimization. As we see here: only one kind of predicate does dynamic sampling, and this dynamic sampling is the “adaptive” one, estimating not only single table cardinality but joins and aggregations as well. This is the USABLE (PERMANENT) one. The other, did it only temporarily until statistics extensions were automatically created, SUPERSEDED with HAS_STATS.

In summary, MISSING_STATS state is seen only when, given the simple column equality, there are possible statistics that are missing. And HAS_STATS means that all the statistics that can be used by optimizer for this predicate are available and no more can be gathered. Each directive will go through HAS_STATS during the learning phase. And then, it stays in HAS_STATS or switches definitely to PERMANENT state when HAS_STAT encountered misestimate again.

Cet article Oracle SPD status on two learning paths est apparu en premier sur Blog dbi services.

NTP is not working for ODA new deployment (reimage) in version 19.8?

Fri, 2020-12-18 10:46

Having recently reimaged and patched several ODA in version 19.8 and 19.9, I could see an issue with NTP. During my troubleshooting I could determine the root cause and find appropriate solution. Through this blog I would like to share my experience with you.

Symptom/Analysis

ODA version 19.6 or higher is coming with Oracle Linux 7. Since Oracle Linux 7 the default synchronization service is not ntp any more but chrony. In Oracle Linux 7, ntp is still available and can still be used. But ntp service will disappear in Oracle Linux 8.

What I could realize from my last deployments and patching is that :

  • Patching your ODA to version 19.8 or 19.9 from 19.6 : The system will still use ntpd and chronyd service will be deactivated. All is working fine.
  • You reimage your ODA to version 19.8 : chronyd will be activated and NTP will not work any more.
  • You reimage your ODA to version 19.9 : ntpd will be activated and NTP will be working with no problem.

So the problem is only if you reimage your ODA to version 19.8.

Problem explanation

The problem is due to the fact that the odacli script deploying the appliance will still update the ntpd configuration (/etc/ntpd.conf) with the IP address provided and not chronyd. But chronyd will be, by default, activated and started then with no configuration.

Solving the problem

There is 2 solutions.

A/ Configure and use chronyd

You configure /etc/chrony.conf with the NTP addresses given during appliance creation and you restart chronyd service.

Configure chrony :

oracle@ODA01:/u01/app/oracle/local/dmk/etc/ [rdbms19.8.0.0] vi /etc/chrony.conf

oracle@ODA01:/u01/app/oracle/local/dmk/etc/ [rdbms19.8.0.0] cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.pool.ntp.org iburst
#server 1.pool.ntp.org iburst
#server 2.pool.ntp.org iburst
#server 3.pool.ntp.org iburst
server 212.X.X.X.103 prefer
server 212.X.X.X.100
server 212.X.X.X.101


# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).
rtcsync

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
#keyfile /etc/chrony.keys

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking


And you restart chrony service :

[root@ODA01 ~]# service chronyd restart
Redirecting to /bin/systemctl restart chronyd.service


B/ Start ntp

Starting ntp will automatically stop chrony service.

[root@ODA01 ~]# ntpq -p
ntpq: read: Connection refused

[root@ODA01 ~]# service ntpd restart
Redirecting to /bin/systemctl restart ntpd.service


Checking synchronization :

[root@ODA01 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
lantime. domain_name .STEP.          16 u    - 1024    0    0.000    0.000   0.000
*ntp1. domain_name    131.188.3.223    2 u  929 1024  377    0.935   -0.053   0.914
+ntp2. domain_name    131.188.3.223    2 u  113 1024  377    0.766    0.184   2.779


Checking both ntp and chrony services :

[root@ODA01 ~]# service ntpd status
Redirecting to /bin/systemctl status ntpd.service
● ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-11-27 09:40:08 CET; 31min ago
  Process: 68548 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 68549 (ntpd)
    Tasks: 1
   CGroup: /system.slice/ntpd.service
           └─68549 /usr/sbin/ntpd -u ntp:ntp -g

Nov 27 09:40:08 ODA01 ntpd[68549]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
Nov 27 09:40:08 ODA01 ntpd[68549]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Nov 27 09:40:08 ODA01 ntpd[68549]: Listen normally on 1 lo 127.0.0.1 UDP 123
Nov 27 09:40:08 ODA01 ntpd[68549]: Listen normally on 2 btbond1 10.X.X.10 UDP 123
Nov 27 09:40:08 ODA01 ntpd[68549]: Listen normally on 3 priv0 192.X.X.24 UDP 123
Nov 27 09:40:08 ODA01 ntpd[68549]: Listen normally on 4 virbr0 192.X.X.1 UDP 123
Nov 27 09:40:08 ODA01 ntpd[68549]: Listening on routing socket on fd #21 for interface updates
Nov 27 09:40:08 ODA01 ntpd[68549]: 0.0.0.0 c016 06 restart
Nov 27 09:40:08 ODA01 ntpd[68549]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Nov 27 09:40:08 ODA01 ntpd[68549]: 0.0.0.0 c011 01 freq_not_set

[root@ODA01 ~]# service chronyd status
Redirecting to /bin/systemctl status chronyd.service
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2020-11-27 09:40:08 CET; 32min ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 46183 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 46180 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 46182 (code=exited, status=0/SUCCESS)

Nov 27 09:18:25 ODA01 systemd[1]: Starting NTP client/server...
Nov 27 09:18:25 ODA01 chronyd[46182]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Nov 27 09:18:25 ODA01 chronyd[46182]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Nov 27 09:18:25 ODA01 systemd[1]: Started NTP client/server.
Nov 27 09:40:08 ODA01 systemd[1]: Stopping NTP client/server...
Nov 27 09:40:08 ODA01 systemd[1]: Stopped NTP client/server.


You might need to deactivate chronyd service with systemctl to avoid chronyd starting automatically after server reboot.

Are you getting a socket error with chrony?

If you are getting following error starting chrony, you will need to give appropriate option to start chronyd with IPv4 :

Nov 27 09:09:19 ODA01 chronyd[35107]: Could not open IPv6 command socket : Address family not supported by protocol.


Example of error encountered :

[root@ODA01 ~]# service chronyd status
Redirecting to /bin/systemctl status chronyd.service
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-11-27 09:09:19 CET; 5min ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 35109 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 35105 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 35107 (chronyd)
    Tasks: 1
   CGroup: /system.slice/chronyd.service
           └─35107 /usr/sbin/chronyd

Nov 27 09:09:19 ODA01 systemd[1]: Starting NTP client/server...
Nov 27 09:09:19 ODA01 chronyd[35107]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Nov 27 09:09:19 ODA01 chronyd[35107]: Could not open IPv6 command socket : Address family not supported by protocol
Nov 27 09:09:19 ODA01 chronyd[35107]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Nov 27 09:09:19 ODA01 systemd[1]: Started NTP client/server.


Chronyd system service is using a variable to set options :

[root@ODA01 ~]# cat /usr/lib/systemd/system/chronyd.service
[Unit]
Description=NTP client/server
Documentation=man:chronyd(8) man:chrony.conf(5)
After=ntpdate.service sntp.service ntpd.service
Conflicts=ntpd.service systemd-timesyncd.service
ConditionCapability=CAP_SYS_TIME

[Service]
Type=forking
PIDFile=/var/run/chrony/chronyd.pid
EnvironmentFile=-/etc/sysconfig/chronyd
ExecStart=/usr/sbin/chronyd $OPTIONS
ExecStartPost=/usr/libexec/chrony-helper update-daemon
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=full

[Install]
WantedBy=multi-user.target


Need to put options -4 to chronyd service configuration file :

[root@ODA01 ~]# cat /etc/sysconfig/chronyd
# Command-line options for chronyd
OPTIONS=""

[root@ODA01 ~]# vi /etc/sysconfig/chronyd

[root@ODA01 ~]# cat /etc/sysconfig/chronyd
# Command-line options for chronyd
OPTIONS="-4"


You will just need to restart chrony service :

[root@ODA01 ~]# service chronyd restart
Redirecting to /bin/systemctl restart chronyd.service

[root@ODA01 ~]# service chronyd status
Redirecting to /bin/systemctl status chronyd.service
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-11-27 09:18:25 CET; 4s ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 46183 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 46180 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 46182 (chronyd)
    Tasks: 1
   CGroup: /system.slice/chronyd.service
           └─46182 /usr/sbin/chronyd -4

Nov 27 09:18:25 ODA01 systemd[1]: Starting NTP client/server...
Nov 27 09:18:25 ODA01 chronyd[46182]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Nov 27 09:18:25 ODA01 chronyd[46182]: Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Nov 27 09:18:25 ODA01 systemd[1]: Started NTP client/server.


Finally you can then use following command to check NTP synchronisation with chronyd :

[root@ODA01 ~]# chronyc tracking


Cet article NTP is not working for ODA new deployment (reimage) in version 19.8? est apparu en premier sur Blog dbi services.

Efficiently query DBA_EXTENTS for FILE_ID / BLOCK_ID

Tue, 2020-12-15 02:15
By Franck Pachot

.
This was initially posted to CERN Database blog on Thursday, 27 September 2018 where it seems to be lost. Here is a copy thanks to web.archive.org

Did you ever try to query DBA_EXTENTS on a very large database with LMT tablespaces? I had to, in the past, in order to find which segment a corrupt block belonged to. The information about extent allocation is stored in the datafiles headers, visible though X$KTFBUE, and queries on it can be very expensive. In addition to that, the optimizer tends to start with the segments and get to this X$KTFBUE for each of them. At this time, I had quickly created a view on the internal dictionary tables, forcing to start by X$KTFBUE with materialized CTE, to replace DBA_EXTENTS. I published this on dba-village in 2006.

I recently wanted to know the segment/extend for a hot block, identified by its file_id and block_id on a 900TB database with 7000 datafiles and 90000 extents, so I went back to this old query and I got my result in 1 second. The idea is to be sure that we start with the file (X$KCCFE) and then get to the extent allocation (X$KTFBUE) before going to the segments:

So here is the query:


column owner format a6
column segment_type format a20
column segment_name format a15
column partition_name format a15
set linesize 200
set timing on time on echo on autotrace on stat
WITH
 l AS ( /* LMT extents indexed on ktfbuesegtsn,ktfbuesegfno,ktfbuesegbno */
  SELECT ktfbuesegtsn segtsn,ktfbuesegfno segrfn,ktfbuesegbno segbid, ktfbuefno extrfn,
         ktfbuebno fstbid,ktfbuebno + ktfbueblks - 1 lstbid,ktfbueblks extblks,ktfbueextno extno
  FROM sys.x$ktfbue
 ),
 d AS ( /* DMT extents ts#, segfile#, segblock# */
  SELECT ts# segtsn,segfile# segrfn,segblock# segbid, file# extrfn,
         block# fstbid,block# + length - 1 lstbid,length extblks, ext# extno
  FROM sys.uet$
 ),
 s AS ( /* segment information for the tablespace that contains afn file */
  SELECT /*+ materialized */
  f1.fenum afn,f1.ferfn rfn,s.ts# segtsn,s.FILE# segrfn,s.BLOCK# segbid ,s.TYPE# segtype,f2.fenum segafn,t.name tsname,blocksize
  FROM sys.seg$ s, sys.ts$ t, sys.x$kccfe f1,sys.x$kccfe f2 
  WHERE s.ts#=t.ts# AND t.ts#=f1.fetsn AND s.FILE#=f2.ferfn AND s.ts#=f2.fetsn
 ),
 m AS ( /* extent mapping for the tablespace that contains afn file */
SELECT /*+ use_nl(e) ordered */
 s.afn,s.segtsn,s.segrfn,s.segbid,extrfn,fstbid,lstbid,extblks,extno, segtype,s.rfn, tsname,blocksize
 FROM s,l e
 WHERE e.segtsn=s.segtsn AND e.segrfn=s.segrfn AND e.segbid=s.segbid
 UNION ALL
 SELECT /*+ use_nl(e) ordered */ 
 s.afn,s.segtsn,s.segrfn,s.segbid,extrfn,fstbid,lstbid,extblks,extno, segtype,s.rfn, tsname,blocksize
 FROM s,d e
  WHERE e.segtsn=s.segtsn AND e.segrfn=s.segrfn AND e.segbid=s.segbid
 UNION ALL
 SELECT /*+ use_nl(e) use_nl(t) ordered */
 f.fenum afn,null segtsn,null segrfn,null segbid,f.ferfn extrfn,e.ktfbfebno fstbid,e.ktfbfebno+e.ktfbfeblks-1 lstbid,e.ktfbfeblks extblks,null extno, null segtype,f.ferfn rfn,name tsname,blocksize
 FROM sys.x$kccfe f,sys.x$ktfbfe e,sys.ts$ t
 WHERE t.ts#=f.fetsn and e.ktfbfetsn=f.fetsn and e.ktfbfefno=f.ferfn
 UNION ALL
 SELECT /*+ use_nl(e) use_nl(t) ordered */
 f.fenum afn,null segtsn,null segrfn,null segbid,f.ferfn extrfn,e.block# fstbid,e.block#+e.length-1 lstbid,e.length extblks,null extno, null segtype,f.ferfn rfn,name tsname,blocksize
 FROM sys.x$kccfe f,sys.fet$ e,sys.ts$ t
 WHERE t.ts#=f.fetsn and e.ts#=f.fetsn and e.file#=f.ferfn
 ),
 o AS (
  SELECT s.tablespace_id segtsn,s.relative_fno segrfn,s.header_block   segbid,s.segment_type,s.owner,s.segment_name,s.partition_name
  FROM SYS_DBA_SEGS s
 ),
datafile_map as (
SELECT
 afn file_id,fstbid block_id,extblks blocks,nvl(segment_type,decode(segtype,null,'free space','type='||segtype)) segment_type,
 owner,segment_name,partition_name,extno extent_id,extblks*blocksize bytes,
 tsname tablespace_name,rfn relative_fno,m.segtsn,m.segrfn,m.segbid
 FROM m,o WHERE extrfn=rfn and m.segtsn=o.segtsn(+) AND m.segrfn=o.segrfn(+) AND m.segbid=o.segbid(+)
UNION ALL
SELECT
 file_id+(select to_number(value) from v$parameter WHERE name='db_files') file_id,
 1 block_id,blocks,'tempfile' segment_type,
 '' owner,file_name segment_name,'' partition_name,0 extent_id,bytes,
  tablespace_name,relative_fno,0 segtsn,0 segrfn,0 segbid
 FROM dba_temp_files
)
select * from datafile_map where file_id=5495 and 11970455 between block_id and block_id+blocks

And here is the result, with execution statistics:



   FILE_ID   BLOCK_ID     BLOCKS SEGMENT_TYPE         OWNER  SEGMENT_NAME    PARTITION_NAME    EXTENT_ID      BYTES TABLESPACE_NAME      RELATIVE_FNO     SEGTSN     SEGRFN    SEGBID
---------- ---------- ---------- -------------------- ------ --------------- ---------------- ---------- ---------- -------------------- ------------ ---------- ---------- ----------
      5495   11964544            8192 INDEX PARTITION LHCLOG DN_PK           PART_DN_20161022 1342         67108864 LOG_DATA_20161022            1024       6364       1024        162

Elapsed: 00:00:01.25

Statistics
----------------------------------------------------------
        103  recursive calls
       1071  db block gets
      21685  consistent gets
        782  physical reads
        840  redo size
       1548  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Knowing the segment from the block address is important in performance tuning, when we get the file_id/block_id from wait event parameters. It is even more important when a block corrution is detected ans having a fast query may help.

Cet article Efficiently query DBA_EXTENTS for FILE_ID / BLOCK_ID est apparu en premier sur Blog dbi services.

Amazon Aurora: calling a lambda from a trigger

Mon, 2020-12-14 15:13
By Franck Pachot

.
You may want your RDS database to interact with other AWS services. For example, send a notification on a business or administration situation, with a “push” method rather than a “pull” one from a Cloud watch alert. You may even design this call to be triggered on database changes. And Amazon Aurora provides this possibility by running a lambda from the database through calling mysql.lambda_async() from a MySQL trigger. This is an interesting feature but I think that it is critical to understand how it works in order to use it correctly.
This is the kind of feature that looks very nice on a whiteboard or powerpoint: the DML event (like an update) runs a trigger that calls the lambda, all event-driven. However, this is also dangerous: are you sure that every update must execute this process? What about an update during an application release, or a dump import, or a logical replication target? Now imagine that you have a bug in your application that has set some wrong data and you have to fix it in emergency in the production database, under stress, with manual updates and aren’t aware of that trigger, or just forget about it in this situation… Do you want to take this risk? As the main idea is to run some external service, the consequence might be very visible and hard to fix, like spamming all your users, or involuntarily DDoS a third-tier application.

I highly encourage to encapsulate the DML and the call of lambda in a procedure that is clearly named and described. For example, let’s take a silly example: sending a “your address has been changed” message when a user updates his address. Don’t put the “send message” call in an AFTER UPDATE trigger. Because the UPDATE semantic is to update. Not to send a message. What you can do is write a stored procedure like UPDATE_ADDRESS() that will do the UPDATE, and call the “send message” lambda. You may even provide a boolean parameter to enable or not the sending of the message. Then, the ones who call the stored procedure know what will happen. And the one who just do an update,… will just do an update. Actually, executing DML directly from the application is often a mistake. A database should expose business-related data services, like many other components of your application architecture, and this is exactly the goal of stored procedures.

I’m sharing here some tests on calling lambda from Aurora MySQL.

Wiring the database to lambdas

A lambda is not a simple procedure that you embed in your program. It is a service and you have to control the access to it:

  • You create the lambda (create function, deploy and get the ARN)
  • You define an IAM policy to invoke this lambda
  • You define an IAM role to apply this policy
  • You set this role as aws_default_lambda_role in the RDS cluster parameter group
  • You add this role to the cluster (RDS -> database cluster -> Manage IAM roles)

Here is my lambda which just logs the event for my test:


import json

def lambda_handler(event, context):
    print('Hello.')
    print(event)
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Creating the test database

 drop database if exists demo;
 create database demo;
 use demo;
 drop table if exists t;
 create table t ( x int, y int );
 insert into t values ( 1, 1 );

I have a simple table here, with a simple row.


delimiter $$
create trigger t_bufer before update on t for each row
begin
 set NEW.y=OLD.x;
 call mysql.lambda_async(
    'arn:aws:lambda:eu-central-1:802756008554:function:log-me',
    concat('{"trigger":"t_bufer","connection":"',connection_id(),'","old": "',OLD.x,'","new":"',NEW.x,'"}'));
end;
$$
delimiter ;

This is my trigger which calls my lambda on an update with old and new value in the message.


MYSQL_PS1="Session 1 \R:\m:\s> " mysql -v -A --host=database-1.cluster-ce5fwv4akhjp.eu-central-1.rds.amazonaws.com --port=3306 --user=admin --password=ServerlessV2

I connect a first session , displaying the time and session in the prompt.


Session 1 23:11:55> use demo;
Database changed

Session 1 23:12:15> truncate table t;
--------------
truncate table t
--------------

Query OK, 0 rows affected (0.09 sec)

Session 1 23:12:29> insert into t values ( 1, 1 );
--------------
insert into t values ( 1, 1 )
--------------

Query OK, 1 row affected (0.08 sec)

this hust resets the testcase when I want to re-run it.


Session 1 23:12:36> start transaction;
--------------
start transaction
--------------

Query OK, 0 rows affected (0.07 sec)

Session 1 23:12:48> update t set x = 42;
--------------
update t set x = 42
--------------

Query OK, 1 row affected (0.11 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Session 1 23:12:55> rollback;
--------------
rollback
--------------

Query OK, 0 rows affected (0.02 sec)

I updated one row, and rolled back my transaction. This is to show that you must be aware that calling a lambda is out of the ACID protection of relational databases. The trigger is executed during the update, without knowing if the transaction will be committed or not (voluntarily or because an exception is encountered). When you do only things in the database (like writing into another table) there is no problem because this happens within the transaction. If the transaction is rolled back, all the DML done by the triggers are rolled back as well. Even if they occurred, nobody sees their effect, except the current session, before the whole transaction is committed.

But when you call a lambda, synchronously or asynchronously, the call is executed and its effect will not be rolled back if your transaction does not complete. This can be ok in some cases, if what you execute is related to the intention of the update and not its completion. Or you must manage this exception in your lambda, maybe by checking in the database that the transaction occurred. But in that case, you should really question your architecture (a call to a service, calling back to the caller…)

So… be careful with that. If your lambda is there to be executed when a database operation has been done, it may have to be done after the commit, in the procedural code that has executed the transaction.

Another test…

This non-ACID execution was the important point I wanted to emphasize, so you can stop here if you want. This other test is interesting for people used to Oracle Database only, probably. In general, nothing guarantees that a trigger is executed only once for the triggering operation. What we have seen above (rollback) can be done internally when a serialization exception is encountered and the database can retry the operation. Oracle Database has non-blocking reads and this is not only for SELECT but also for the read phase of an UPDATE. You may have to read a lot of rows to verify the predicate and update only a few ones, and you don’t want to lock all the rows read but only the ones that are updated. Manually, you would do that with a serializable transaction and retry in case you encounter a rows that have been modified between your MVCC snapshot and the current update time. But at statement level, Oracle does that for you.

It seems that it does not happen in Aurora MySQL and PostgreSQL, as the locking for reads is more aggressive, but just in case I tested the same scenario where an update restart would have occurred in Oracle.


Session 1 23:13:00> start transaction;
--------------
start transaction
--------------

Query OK, 0 rows affected (0.06 sec)

Session 1 23:13:09> update t set x = x+1;
--------------
update t set x = x+1
--------------

Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Session 1 23:13:16> select * from t;
--------------
select * from t
--------------

+------+------+
| x    | y    |
+------+------+
|    2 |    1 |
+------+------+
1 row in set (0.01 sec)

Session 1 23:13:24>

I have started a transaction that increased the value of X, but the transaction is still open. What I do next is from another session.

This is session 2:


Session 2 23:13:32> use demo;

Database changed
Session 2 23:13:34>
Session 2 23:13:35> select * from t;
--------------
select * from t
--------------

+------+------+
| x    | y    |
+------+------+
|    1 |    1 |
+------+------+
1 row in set (0.01 sec)

Of course, thanks to transaction isolation, I do not see the uncommitted change.


Session 2 23:13:38> update t set x = x+1 where x > 0;
--------------
update t set x = x+1 where x > 0
--------------

At this step, the update hangs on the locked row.

Now back in the first session:


Session 1 23:13:49>
Session 1 23:13:50>
Session 1 23:13:50>
Session 1 23:13:50> commit;
--------------
commit
--------------

Query OK, 0 rows affected (0.02 sec)

I just commited my change here, so X has been increased to the value 2.

And here is what happened in my seconds session, with the lock released by the first session:


Query OK, 1 row affected (11.42 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Session 2 23:13:58> commit;
--------------
commit
--------------

Query OK, 0 rows affected (0.01 sec)

Session 2 23:14:10> select * from t;
--------------
select * from t
--------------

+------+------+
| x    | y    |
+------+------+
|    3 |    2 |
+------+------+
1 row in set (0.01 sec)

Session 2 23:14:18>

This is the correct behavior. Even if a select sees the value of X=1 the update cannot be done until the first session has committed its transaction. This is why it waited, and it has read the committed value of X=2 which is then incremented to 3.

And finally here is what was logged by my lambda, as a screenshot and as text:


2020-12-13T23:12:55.558+01:00	START RequestId: 39e4e41f-7853-4b11-a12d-4a3147be3fc7 Version: $LATEST	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:12:55.561+01:00	Hello.	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:12:55.561+01:00	{'trigger': 't_bufer', 'connection': '124', 'old': '1', 'new': '42'}	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:12:55.562+01:00	END RequestId: 39e4e41f-7853-4b11-a12d-4a3147be3fc7	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:12:55.562+01:00	REPORT RequestId: 39e4e41f-7853-4b11-a12d-4a3147be3fc7 Duration: 1.16 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 51 MB	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:16.620+01:00	START RequestId: 440128db-d6de-4b2c-aa98-d7bedf12a3d4 Version: $LATEST	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:16.624+01:00	Hello.	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:16.624+01:00	{'trigger': 't_bufer', 'connection': '124', 'old': '1', 'new': '2'}	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:16.624+01:00	END RequestId: 440128db-d6de-4b2c-aa98-d7bedf12a3d4	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:16.624+01:00	REPORT RequestId: 440128db-d6de-4b2c-aa98-d7bedf12a3d4 Duration: 1.24 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 51 MB	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:58.156+01:00	START RequestId: c50ceab7-6e75-4e43-b77d-26c1f6347fec Version: $LATEST	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:58.160+01:00	Hello.	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:58.160+01:00	{'trigger': 't_bufer', 'connection': '123', 'old': '2', 'new': '3'}	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:58.160+01:00	END RequestId: c50ceab7-6e75-4e43-b77d-26c1f6347fec	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8
2020-12-13T23:13:58.160+01:00	REPORT RequestId: c50ceab7-6e75-4e43-b77d-26c1f6347fec Duration: 0.91 ms Billed Duration: 1 ms Memory Size: 128 MB Max Memory Used: 51 MB	2020/12/13/[$LATEST]25e73c8c6f9e4d168fa29b9ad2ba76d8

First, we see at 23:12:55 the update from X=1 to X=42 that I rolled back later. This proves that the call to lambda is not transactional. It may sound obvious but if you come from Oracle Database you would have used Advanced Queuing where the queue is stored in a RDBMS table and then benefit from sharing the same transaction as the submitter.
My update occurred at 23:12:48 but remember that those calls are asynchronous so the log happens a bit later.

Then there was my second test where I updated, at 23:13:09, X from 1 to 2 and we see this update logged at 23:13:16 which is after the update, for the asynchronous reason, but before the commit which happened at 23:13:50 according to my session log above. Then no doubt that the execution of the lambda does not wait for the completion of the transaction that triggered it.

And then the update from the session 2 which was executed at 23:13:38 but returned at 23:13:50 as it was waiting for the first session to end its transaction. The lambda log at 23:13:58 shows it and shows that the old value is X=2 which is expected as the update was done after the first session change. This is where, in Oracle, we would have seen two entries: one updating from X=1, because this would have been read without lock, and then rolled back to restart the update after X=2. But we don’t have this problem here as MySQL acquires a row lock during the read phase.

However, nothing guarantees that there are no internal rollback + restart. And anyway, rollback can happen for many reasons and you should think, during design, whether the call to the lambda should occur for DML intention or DML completion. For example, if you use it for some event sourcing, you may accept the asynchronous delay, but you don’t want to receive an event that actually didn’t occur.

Cet article Amazon Aurora: calling a lambda from a trigger est apparu en premier sur Blog dbi services.

Validate your SQL Server infrastructure with dbachecks

Mon, 2020-12-14 11:12
Introduction

In this blog post, I’ll do an introduction to the PowerShell module dbachecks.
dbachecks uses Pester and dbatools to validate your SQL Server infrastructure.
With very minimal configuration you can check that your infrastructure is configured following standard best practices or your own policy.

We will see the following topics

– Prerequisites for dbachecks Installation
– Introduction to Pester
– Perform a Check
– Manage the Configuration items – Import & Export
– Output

– Power BI dashboard

Prerequisites for dbachecks Installation

The dbachecks module depends on the following modules:

  • dbatools
  • Pester
  • PSFramework

The easiest way to perform the installation is to do a simple Install-Module. It will get the latest dbachecks version from the PSGallery and install all the requires modules up to date.

I had many issues with this method.
The latest versions of PSFramework (1.4.150) did not seem to work with the current dbachecks version.
Installing the latest version of Pester (5.0.4) brings issues too.
When running a command I would get the following error:

Unable to find type [Pester.OutputTypes].
At line:219 char:9
+         [Pester.OutputTypes]$Show = 'All'
+         ~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (Pester.OutputTypes:TypeName) [], RuntimeException
    + FullyQualifiedErrorId : TypeNotFound

To avoid this, prior to installing dbachecks, you should first install PSFramework with version 1.1.59.
Pester is already shipped with the recent versions of Windows with version 3.4.
If want to get a newer version, install manually version 4. Issues seem to come with version 5.

Set-PSRepository -Name "PSGallery" -InstallationPolicy Trusted

Install-Module PSFramework -RequiredVersion 1.1.59
Install-Module Pester -RequiredVersion 4.10.1 -Force -SkipPublisherCheck
Install-Module dbachecks

Here is what I got working:

Pester

dbacheks relies heavily on Pester. Pester is a framework that brings functions to build a unit-test for PowerShell code.
If you have don’t know what is Pester I’d recommend you read my introduction to Pester post here.

dbatools

The checks performed by dbatools are based on dbatools functions. If you didn’t tried dbatools yet I’d recommend you to have a look at dbatools’ repository and try a few commands.

Perform a Check

Now let’s talk about dbachecks. It’s is basically a set of Pester tests for your SQL Server infrastructure with code relying heavily on dbatools module.
Let’s look at the list of available “Checks” from dbachecks with Get-DbcCheck.

As you can see, they are currently 134 checks available covering a wide range of configurations you might want to check.

Let’s run a Check on an SQL Server instance. To do so we use the Invoke-DbcCheck command with the Check UniqueTag and the target Instance name.

This one checks for the database owner for all user databases of the instance. The default value for this check is configured to “sa”.
My check returned everything green. There’s only one database on this instance and its database owner is “sa”.

Check multiple instances

They are many ways to run checks against multiple instances.
You can define a list of instances in the config parameter with the command below. I’ll come to configuration elements in a minute.

Set-DbcConfig -Name app.sqlinstance -Value "server1\InstA", "localhost", "server2\instA"

Here I will use a CMS and the dbatools command Get-DbaRegisteredServer to get my list of instances. On the other instance, one of the databases got a non-“sa” database owner.
Maybe this owner is a valid one and I want to have this check succeed. We can modify the check configuration.

Check Configuration elements

All checks can have configuration elements.
To search in the configuration elements you can use Get-DbcConfig. I want to change the database owner’s name, I can search for all config items with names like “owner”.

The configuration values are also available with Get-DbcConfigValue.

So now, with Set-DbcConfig I can add a valid database owner to the ValidDatabaseOwner check.

Here is the output of the same check run again:

Of course, multiple tests can be run at the same time, for example:

Manage the Configuration items – Import & Export

We have seen how to use Set-DbcConfig to modify your checks configuration. You don’t need to change those configurations one by one every time you want to check your infrastructure.
All configuration items can be exported to a JSON file and imported back again.

I can set the configuration items as needed and then do Export-DbcConfig specifying the destination file:

# LastFullBackup - Maximum number of days before Full Backups are considered outdated
Set-DbcConfig -Name policy.backup.fullmaxdays -Value 7

# Percent disk free
Set-DbcConfig -Name policy.diskspace.percentfree -Value 5

# The maximum percentage variance that the last run of a job is allowed over the average for that job
Set-DbcConfig -Name agent.lastjobruntime.percentage -Value 20
# The maximum percentage variance that a currently running job is allowed over the average for that job
Set-DbcConfig -Name agent.longrunningjob.percentage -Value 20

# Maximum job history log size (in rows). The value -1 means disabled
Set-DbcConfig -Name agent.history.maximumhistoryrows -Value 10000

# The maximum number of days to check for failed jobs
Set-DbcConfig -Name agent.failedjob.since -Value 8

# The number of days prior to check for error log issues - default 2
Set-DbcConfig -Name agent.failedjob.since -Value 3

Export-DbcConfig -Path "$($HOME)\Documents\WindowsPowerShell\MorningCheck-Qual.json"

Here is the output of the Export-DbcConfig:

As you can guess imports of Config files are done with Import-DbcConfig.

Import-DbcConfig -Path "$($HOME)\Documents\WindowsPowerShell\MorningCheck-Qual.json"
Output The Show parameter

The dbachecks output in the console gives a great level of details on what is going on. When you have thousands of checks running you might not want to get this wall of green text.
To show only the Failed checks you can use the -Show parameter of Invoke-DbcCheck with the value “Fails”.

Invoke-DbcCheck -Check ValidDatabaseOwner -Show Fails

If you want even fewer details, you can use -Show Summary.

XML files

Tests results can also be saved to XML files using the OutputFile parameter like this:

Here is an output example:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<test-results xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="nunit_schema_2.5.xsd" name="Pester" total="2" errors="0" failures="1" not-run="0" inconclusive="0" ignored="0" skipped="0" invalid="0" date="2020-12-14" time="15:29:47">
  <environment clr-version="4.0.30319.42000" user-domain="win10vm4" cwd="C:\Users\win10vm4admin\Documents\WindowsPowerShell\Modules\dbachecks\2.0.7\checks" platform="Microsoft Windows 10 Pro|C:\WINDOWS|\Device\Harddisk0\Partition4" machine-name="win10vm4" nunit-version="2.5.8.0" os-version="10.0.18363" user="win10vm4admin" />
  <culture-info current-culture="en-US" current-uiculture="en-US" />
  <test-suite type="TestFixture" name="Pester" executed="True" result="Failure" success="False" time="0.3166" asserts="0" description="Pester">
    <results>
      <test-suite type="TestFixture" name="C:\Users\win10vm4admin\Documents\WindowsPowerShell\Modules\dbachecks\2.0.7\checks\Database.Tests.ps1" executed="True" result="Failure" success="False" time="0.3166" asserts="0" description="C:\Users\win10vm4admin\Documents\WindowsPowerShell\Modules\dbachecks\2.0.7\checks\Database.Tests.ps1">
        <results>
          <test-suite type="TestFixture" name="Valid Database Owner" executed="True" result="Failure" success="False" time="0.2048" asserts="0" description="Valid Database Owner">
            <results>
              <test-suite type="TestFixture" name="Testing Database Owners on localhost" executed="True" result="Failure" success="False" time="0.1651" asserts="0" description="Testing Database Owners on localhost">
                <results>
                  <test-case description="Database dbi_tools - owner sa should be in this list ( sa ) on win10vm4" name="Valid Database Owner.Testing Database Owners on localhost.Database dbi_tools - owner sa should be in this list ( sa ) on win10vm4" time="0.0022" asserts="0" success="True" result="Success" executed="True" />
                  <test-case description="Database testDB - owner win10vm4\win10vm4admin should be in this list ( sa ) on win10vm4" name="Valid Database Owner.Testing Database Owners on localhost.Database testDB - owner win10vm4\win10vm4admin should be in this list ( sa ) on win10vm4" time="0.0043" asserts="0" success="False" result="Failure" executed="True">
                    <failure>
                      <message>Expected collection sa to contain 'win10vm4\win10vm4admin', because The account that is the database owner is not what was expected, but it was not found.</message>
                      <stack-trace>at &lt;ScriptBlock&gt;, C:\Users\win10vm4admin\Documents\WindowsPowerShell\Modules\dbachecks\2.0.7\checks\Database.Tests.ps1: line 172
172:                         $psitem.Owner | Should -BeIn $TargetOwner -Because "The account that is the database owner is not what was expected"</stack-trace>
                    </failure>
                  </test-case>
                </results>
              </test-suite>
            </results>
          </test-suite>
        </results>
      </test-suite>
    </results>
  </test-suite>
</test-results>

These XML files can be used to automate reporting with the tool of your choice.

Excel export

There’s a way to export the results to Excel. If you want to try it I’d recommend you to read Jess Pomfret’s blog post dbachecks meets ImportExcel.

Power BI dashboard

Checks can be displayed in a beautiful PowerBI dashboard.

The Update-DbcPowerBiDataSource command converts results and exports files in the required format for launching the Power BI command Start-DbcPowerBI.

The Update-DbcPowerBiDataSource command can take an “Environnement” parameter which is useful to compare your environments.
Here is an example of how it can be used.

Import-DbcConfig -Path "$($HOME)\Documents\WindowsPowerShell\MorningCheck-Qual.json"
Invoke-DbcCheck -Check ValidDatabaseOwner, ErrorLogCount `
    -Show Summary -Passthru | Update-DbcPowerBiDataSource -Environment 'Qual'

Import-DbcConfig -Path "$($HOME)\Documents\WindowsPowerShell\MorningCheck-Prod-Listener.json"
Invoke-DbcCheck -Check ValidDatabaseOwner, ErrorLogCount `
    -Show Summary -Passthru | Update-DbcPowerBiDataSource -Environment 'Prod'

Start-DbcPowerBi

The dashboard.

Conclusion

From my experience, dbatools use amongst DBA has grown a lot recently. Likewise, I think dbacheck will be used more and more by DBAs in the years to come.
It’s easy to use and can save you save a lot of time for your Daily/Weekly SQL Server checks.

This blog post was just to get you started with dbachecks. Do not hesitate to comment if you have any questions.

Cet article Validate your SQL Server infrastructure with dbachecks est apparu en premier sur Blog dbi services.

Oracle write consistency bug and multi-thread de-queuing

Mon, 2020-12-14 03:21
By Franck Pachot

.
This was initially posted on CERN Database blog where it seems to be lost. Here is a copy thanks to web.archive.org
Additional notes:
– I’ve tested and got the same behaviour in Oracle 21c
– you will probably enjoy reading Hatem Mahmoud going further on Write consistency and DML restart

Posted by Franck Pachot on Thursday, 27 September 2018

Here is a quick test I did after encountering an abnormal behavior in write consistency and before finding some references to a bug on StackOverflow (yes, write consistency questions on StackOverflow!) and AskTOM. And a bug opened by Tom Kyte in 2011, that is still there in 18c.

The original issue was with a task management system to run jobs. Here is the simple table where all rows have a ‘NEW’ status and the goal is to have several threads processing them by updating them to the ‘HOLDING’ status’ and adding the process name.


set echo on
drop table DEMO;
create table DEMO (ID primary key,STATUS,NAME,CREATED)
 as select rownum,cast('NEW' as varchar2(10)),cast(null as varchar2(10)),sysdate+rownum/24/60 from xmltable('1 to 10')
/

Now here is the query that selects the 5 oldest rows in status ‘NEW’ and updates them to the ‘HOLDING’ status:


UPDATE DEMO SET NAME = 'NUMBER1', STATUS = 'HOLDING' 
WHERE ID IN (
 SELECT ID FROM (
  SELECT ID, rownum as counter 
  FROM DEMO 
  WHERE STATUS = 'NEW' 
  ORDER BY CREATED
 ) 
WHERE counter <= 5) 
;

Note that the update also sets the name of the session which has processed the rows, here ‘NUMBER1’.

Once the query started, and before the commit, I’ve run the same query from another session, but with ‘NUMBER2’.


UPDATE DEMO SET NAME = 'NUMBER2', STATUS = 'HOLDING' 
WHERE ID IN (
 SELECT ID FROM (
  SELECT ID, rownum as counter 
  FROM DEMO 
  WHERE STATUS = 'NEW' 
  ORDER BY CREATED
 ) 
WHERE counter <= 5) 
;

Of course, this waits on row lock from the first session as it has selected the same rows. Then I commit the first session, and check, from the first session what has been updated:


commit;
set pagesize 1000
select versions_operation,versions_xid,DEMO.* from DEMO versions between scn minvalue and maxvalue order by ID,2;

V VERSIONS_XID             ID STATUS     NAME       CREATED        
- ---------------- ---------- ---------- ---------- ---------------
U 0500110041040000          1 HOLDING    NUMBER1    27-SEP-18 16:48
                            1 NEW                   27-SEP-18 16:48
U 0500110041040000          2 HOLDING    NUMBER1    27-SEP-18 16:49
                            2 NEW                   27-SEP-18 16:49
U 0500110041040000          3 HOLDING    NUMBER1    27-SEP-18 16:50
                            3 NEW                   27-SEP-18 16:50
U 0500110041040000          4 HOLDING    NUMBER1    27-SEP-18 16:51
                            4 NEW                   27-SEP-18 16:51
U 0500110041040000          5 HOLDING    NUMBER1    27-SEP-18 16:52
                            5 NEW                   27-SEP-18 16:52
                            6 NEW                   27-SEP-18 16:53
                            7 NEW                   27-SEP-18 16:54
                            8 NEW                   27-SEP-18 16:55
                            9 NEW                   27-SEP-18 16:56
                           10 NEW                   27-SEP-18 16:57

I have used flashback query to see all versions of the rows. All 10 have been created and the the first 5 of them have been updated by NUMBER1.

Now, my second session continues, updating to NUMBER2. I commit and look at the row versions again:


commit;
set pagesize 1000
select versions_operation,versions_xid,DEMO.* from DEMO versions between scn minvalue and maxvalue order by ID,2;

V VERSIONS_XID             ID STATUS     NAME       CREATED        
- ---------------- ---------- ---------- ---------- ---------------
U 04001B0057030000          1 HOLDING    NUMBER2    27-SEP-18 16:48
U 0500110041040000          1 HOLDING    NUMBER1    27-SEP-18 16:48
                            1 NEW                   27-SEP-18 16:48
U 04001B0057030000          2 HOLDING    NUMBER2    27-SEP-18 16:49
U 0500110041040000          2 HOLDING    NUMBER1    27-SEP-18 16:49
                            2 NEW                   27-SEP-18 16:49
U 04001B0057030000          3 HOLDING    NUMBER2    27-SEP-18 16:50
U 0500110041040000          3 HOLDING    NUMBER1    27-SEP-18 16:50
                            3 NEW                   27-SEP-18 16:50
U 04001B0057030000          4 HOLDING    NUMBER2    27-SEP-18 16:51
U 0500110041040000          4 HOLDING    NUMBER1    27-SEP-18 16:51
                            4 NEW                   27-SEP-18 16:51
U 04001B0057030000          5 HOLDING    NUMBER2    27-SEP-18 16:52
U 0500110041040000          5 HOLDING    NUMBER1    27-SEP-18 16:52
                            5 NEW                   27-SEP-18 16:52
                            6 NEW                   27-SEP-18 16:53
                            7 NEW                   27-SEP-18 16:54
                            8 NEW                   27-SEP-18 16:55
                            9 NEW                   27-SEP-18 16:56
                           10 NEW                   27-SEP-18 16:57

This is not what I expected. I wanted my second session to process the other rows, but here it seems that it has processed the same rows as the first one. What has been done by the NUMBER1 has been lost and overwritten by NUMBER2. This is inconsistent, violates ACID properties, and should not happen. An SQL statement must ensure write consistency: either by locking all the rows as soon as they are read (for non-MVCC databases where reads block writes), or re-starting the update when a mutating row is encountered. Oracle default behaviour is in the second case, NUMBER2 query reads the rows 1 to 5, because the changes by NUMBER1, not committed yet, are invisible from NUMBER2. But the execution should keep track of the columns referenced in the where clause. When attempting to update a row, now that the concurrent change is visible, the update is possible only if the WHERE clause used to select the rows still selects this row. If not, the database should raise an error (this is what happens in serializable isolation level) or re-start the update when in the default statement-level consistency.

Here, probably because of the nested subquery, the write consistency is not guaranteed and this is a bug.

One workaround is not to use subqueries. However, as we need to ORDER BY the rows in order to process the oldest first, we cannot avoid the subquery. The workaround for this is to add STATUS = ‘NEW’ in the WHERE clause of the update, so that the update restart works correctly.

However, the goal of multithreading those processes is to be scalable, and multiple update restarts may finally serialize all those updates.

The preferred solution for this is to ensure that the updates do not attempt to touch the same rows. This can be achieved by a SELECT … FOR UPDATE SKIP LOCKED. As this cannot be added directly to the update statement, we need a cursor. Something like this can do the job:


declare counter number:=5;
begin
 for c in (select /*+ first_rows(5) */ ID FROM DEMO 
           where STATUS = 'NEW' 
           order by CREATED
           for update skip locked)
 loop
  counter:=counter-1;
  update DEMO set NAME = 'NUMBER1', STATUS = 'HOLDING'  where ID = c.ID and STATUS = 'NEW';
  exit when counter=0;
 end loop;
end;
/
commit;

This can be optimized further but just gives an idea of what is needed for a scalable solution. Waiting for locks is not scalable.

Cet article Oracle write consistency bug and multi-thread de-queuing est apparu en premier sur Blog dbi services.

Oracle 21c Security : Mandatory Profile

Fri, 2020-12-11 11:21

With Oracle 21c, it is now possible to enforce a password policy (length, number of digits…) for all pluggable databases or for specific pluggable databases via profiles. This is done by creating a mandatory profile in the root CDB and this profile will be attached to corresponding PDBs.
The mandatory profile is a generic profile that can only have a single parameter, the PASSWORD_VERIFY_FUNCTION.
The password complexity verification function of the mandatory profile is checked before the password complexity function that is associated with the user account profile.
For example, the password length defined in the mandatory profile will take precedence on any other password length defined in any other profile associated to the user.
When defined the limit of the mandatory profile will be enforced in addition to the limits of the actual profile of the user.
A mandatory profile cannot be assigned to a user but should attached to a PDB

In this demonstration we will consider a instance DB21 with 3 PDB
-PDB1
-PDB2
-PDB3

We will create 2 mandatory profiles:
c##mand_profile_pdb1_pdb2 which will be assigned to PDB1 and PDB2
c##mand_profile_pdb3 which will be assigned to PDB3

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
         4 PDB2                           READ WRITE NO
         5 PDB3                           READ WRITE NO
SQL>

We will create two verification functions in the root container that we will associate to our mandatory profiles. The first function will check for a password length to 6

SQL> CREATE OR REPLACE FUNCTION func_pdb1_2_verify_function
 ( username     varchar2,
   password     varchar2,
   old_password varchar2)
 return boolean IS
BEGIN
   if not ora_complexity_check(password, chars => 6) then
      return(false);
   end if;
   return(true);
END;
/  

Function created.

SQL>

The second function will check for a password length to 10

SQL> CREATE OR REPLACE FUNCTION func_pdb3_verify_function
 ( username     varchar2,
   password     varchar2,
   old_password varchar2)
 return boolean IS
BEGIN
   if not ora_complexity_check(password, chars => 10) then
      return(false);
      end if;
   return(true);
END;
/ 

Function created.

SQL>

Now let’s create the two mandatory profiles in the root container

SQL>
CREATE MANDATORY PROFILE c##mand_profile_pdb1_pdb2
LIMIT PASSWORD_VERIFY_FUNCTION func_pdb1_2_verify_function
CONTAINER = ALL;

Profile created.

Remember that we want to associate the mandatoty profile c##mand_profile_pdb1_pdb2 to PDB1 and PDB2. So we can first attach this profile to all PDBs

SQL> CREATE MANDATORY PROFILE c##mand_profile_pdb3
LIMIT PASSWORD_VERIFY_FUNCTION func_pdb3_verify_function
CONTAINER = ALL;  

Profile created.

Remember that we want to associate the mandatoty profile c##mand_profile_pdb1_pdb2 to PDB1 and PDB2. So we can first attach this profile to all PDBs

SQL> show con_name;

CON_NAME
------------------------------
CDB$ROOT

SQL> alter system set mandatory_user_profile=c##mand_profile_pdb1_pdb2;

System altered.

SQL>

To associate the profile c##mand_profile_pdb3 to PDB3, we can edit the spfile of PDB3

SQL> show con_name;

CON_NAME
------------------------------
PDB3
SQL>  alter system set mandatory_user_profile=c##mand_profile_pdb3;

System altered.

SQL>

We can then verify the different values of the parameter MANDATORY_USER_PROFILE in the different PDBs

SQL> show con_name;

CON_NAME
------------------------------
PDB3
SQL>  alter system set mandatory_user_profile=c##mand_profile_pdb3;

System altered.

SQL> show parameter mandatory;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
mandatory_user_profile               string      C##MAND_PROFILE_PDB3
SQL> alter session set container=PDB1;

Session altered.

SQL> show parameter mandatory;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
mandatory_user_profile               string      C##MAND_PROFILE_PDB1_PDB2
SQL>  alter session set container=PDB2;

Session altered.

SQL> show parameter mandatory;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
mandatory_user_profile               string      C##MAND_PROFILE_PDB1_PDB2
SQL>

To test we will try to create a user in PDB3 for example with with a password length < 10

SQL> create user toto identified by "DGDTr##5";
create user toto identified by "DGDTr##5"
*
ERROR at line 1:
ORA-28219: password verification failed for mandatory profile
ORA-20000: password length less than 10 characters


SQL>

Cet article Oracle 21c Security : Mandatory Profile est apparu en premier sur Blog dbi services.

pg_auto_failover: Failover and switchover scenarios

Fri, 2020-12-11 03:43

In the last post we had a look at the installation and setup of pg_auto_failover. We currently have one primary cluster and two replicas synchronizing from this primary cluster. But we potentially also have an issue in the setup: The monitor is running beside the primary instance on the same node and if that nodes goes down the monitor is gone. What happens in that case and how can we avoid that? We also did not look at controlled switch-overs, and this is definitely something you want to have in production. From time to time you’ll need to do some maintenance on one of the nodes, and switching the primary cluster to another node is very handy in such situations. Lets start with the simple case and have a look at switch-overs first.

This is the current state of the setup:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6002408 |       yes |             primary |             primary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6002408 |       yes |           secondary |           secondary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6002408 |       yes |           secondary |           secondary

Before we attempt to do a switch-over you should be aware of your replication settings:

postgres@pgaf1:~$ pg_autoctl get formation settings --pgdata /u02/pgdata/13/monitor/
  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | default |      number_sync_standbys | 1                                                           
  primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 50                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true                                     

What does this tell us:

  • synchronous_standby_names: We’re using synchronous replication and at least one of the two replicas need to confirm a commit (This is a PostgreSQL setting)
  • number_sync_standbys=1: That means at least one standby needs to confirm the commit (This is a pg_auto_failover setting)
  • candidate priority=50: This specifies which replica gets promoted. At the default setting of 50 all replicas have the same chance to be selected for promotion and the monitor will pick the one with the most advanced LSN. (This is a pg_auto_failover setting)
  • replication quorum=true: This mean synchronous replication, a values of false mean asynchronous replication. (This is a pg_auto_failover setting)

You maybe have noticed the “formation” keyword above. A formation is a set of PostgreSQL clusters that are managed together and that means you can use the same monitor to manage multiple sets of PostgreSQL clusters. We are using the default formation in this example.

Lets assume we need to do some maintenance on our primary node and therefore want to switch-over the primary instance to another node. The command to do that is simple:

postgres@pgaf1:~$ pg_autoctl perform switchover --pgdata /u02/pgdata/13/PG1/
16:10:05 15960 INFO  Targetting group 0 in formation "default"
16:10:05 15960 INFO  Listening monitor notifications about state changes in formation "default" and group 0
16:10:05 15960 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |                      Host:Port |       Current State |      Assigned State
---------+--------+-------+--------------------------------+---------------------+--------------------
16:10:05 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |             primary |            draining
16:10:05 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |            draining |            draining
16:10:05 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |           secondary |          report_lsn
16:10:05 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |           secondary |          report_lsn
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |          report_lsn |          report_lsn
16:10:06 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |          report_lsn |          report_lsn
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |          report_lsn |   prepare_promotion
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |   prepare_promotion |   prepare_promotion
16:10:06 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |   prepare_promotion |    stop_replication
16:10:06 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |            draining |      demote_timeout
16:10:06 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |          report_lsn |      join_secondary
16:10:06 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |      demote_timeout |      demote_timeout
16:10:06 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |      join_secondary |      join_secondary
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |    stop_replication |    stop_replication
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |    stop_replication |        wait_primary
16:10:07 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |      demote_timeout |             demoted
16:10:07 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |             demoted |             demoted
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        wait_primary |        wait_primary
16:10:07 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |      join_secondary |           secondary
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        wait_primary |             primary
16:10:07 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |             demoted |          catchingup
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        wait_primary |        join_primary
16:10:07 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        join_primary |        join_primary
16:10:08 | node_3 |     3 | pgaf3.it.dbi-services.com:5432 |           secondary |           secondary
16:10:08 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |          catchingup |          catchingup
16:10:08 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |          catchingup |           secondary
16:10:08 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |        join_primary |             primary
16:10:08 | node_1 |     1 | pgaf1.it.dbi-services.com:5432 |           secondary |           secondary
16:10:08 | node_2 |     2 | pgaf2.it.dbi-services.com:5432 |             primary |             primary
postgres@pgaf1:~$ 

You’ll get the progress messages to the screen so you can actually see what happens. As the services are started with systemd you can also have a look at the journal:

-- Logs begin at Thu 2020-12-10 15:17:38 CET, end at Thu 2020-12-10 16:11:26 CET. --
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  Transition complete: current state is now "secondary"
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  node 1 "node_1" (pgaf1.it.dbi-services.com:5432) reported new state "secondary"
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): secondary ➜ sec
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  New state for this node (node 1, "node_1") (pgaf1.it.dbi-services.com:5432): se
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  node 2 "node_2" (pgaf2.it.dbi-services.com:5432) reported new state "primary"
Dec 10 16:10:08 pgaf1 pg_autoctl[341]: 16:10:08 397 INFO  New state for node 2 "node_2" (pgaf2.it.dbi-services.com:5432): primary ➜ prima
Dec 10 16:10:08 pgaf1 pg_autoctl[327]: 16:10:08 399 INFO  New state for node 2 "node_2" (pgaf2.it.dbi-services.com:5432): primary ➜ prima

The second second node was selected as the new primary, and we can of course confirm that:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary

postgres@pgaf1:~$ 

Next test: What happens when we reboot a node that currently is running a replica? Lets reboot pgaf3 as this one is currently a replica, and it does not run the monitor:

postgres@pgaf3:~$ sudo reboot
postgres@pgaf3:~$ Connection to 192.168.22.192 closed by remote host.
Connection to 192.168.22.192 closed.

Watching at the state the “Reachable” status changes to “no” for the third instance and the LSN falls behind:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6000000 |        no |           secondary |           secondary

Once it is back, the replica is brought back to the configuration and all is fine:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/60026F8 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/60026F8 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6000000 |       yes |           secondary |           secondary

...
postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6013120 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6013120 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6013120 |       yes |           secondary |           secondary

But what happens if we shutdown the monitor node?

postgres@pgaf1:~$ sudo systemctl poweroff
postgres@pgaf1:~$ Connection to 192.168.22.190 closed by remote host.
Connection to 192.168.22.190 closed.

Checking the status on the node which currently hosts the primary cluster:

postgres@pgaf2:~$ pg_autoctl show state --pgdata /u02/pgdata/13/PG1/
10:26:52 1293 WARN  Failed to connect to "postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require", retrying until the server is ready
10:26:52 1293 ERROR Connection to database failed: timeout expired
10:26:52 1293 ERROR Failed to connect to "postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require" after 1 attempts in 2 seconds, pg_autoctl stops retrying now
10:26:52 1293 ERROR Failed to retrieve current state from the monitor

As the monitor is down we cannot anymore ask for status. The primary and the remaining replica cluster are still up and running but we lost the possibility to interact with pg_auto_failover. Booting up the monitor node brings is back into the game:

postgres@pgaf2:~$ pg_autoctl show state --pgdata /u02/pgdata/13/PG1/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6000000 |       yes |           secondary |           secondary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6013240 |       yes |             primary |             primary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6013240 |       yes |           secondary |           secondary

This has a consequence: The monitor should not run on any of the PostgreSQL nodes but on a separate node which is dedicated to the monitor. As you can manage more than one HA setup with the same monitor this should not an issue, though. But this also means that the monitor is a single point of failure and the health of the monitor is critical for pg_auto_failover.

Cet article pg_auto_failover: Failover and switchover scenarios est apparu en premier sur Blog dbi services.

Easy failover and switchover with pg_auto_failover

Thu, 2020-12-10 08:28

One the really cool things with PostgreSQL is, that you have plenty of choices when it comes to tooling. For high availability we usually go with Patroni, but there is also pg_auto_failover and this will be the topic of this post. Because of the recent announcement around CentOS we’ll go with Debian this time. What is already prepared is the PostgreSQL installation (version 13.1), but nothing else. We start from scratch to see, if “is optimized for simplicity and correctness”, as it is stated on the GitHub page holds true.

This is the setup we’ll start with:

Hostname IP-Address Initial role pgaf1.it.dbi-services.com 192.168.22.190 Primary and pg_auto_failover monitor pgaf2.it.dbi-services.com 192.168.22.191 First replica pgaf3.it.dbi-services.com 192.168.22.192 Second replica

As said above, all three nodes have PostgreSQL 13.1 already installed at this location (PostgreSQL was installed from source code, but that should not really matter):

postgres@pgaf1:~$ ls /u01/app/postgres/product/13/db_1/
bin  include  lib  share

What I did in addition, is to create ssh keys and then copy those from each machine to all nodes so password-less ssh connections are available between the nodes (here is the example from the first node):

postgres@pgaf1:~$ ssh-keygen
postgres@pgaf1:~$ ssh-copy-id postgres@pgaf1
postgres@pgaf1:~$ ssh-copy-id postgres@pgaf2
postgres@pgaf1:~$ ssh-copy-id postgres@pgaf3

For installing pg_auto_failover from source make sure that pg_config is in your path:

postgres@pgaf1:~$ which pg_config
/u01/app/postgres/product/13/db_1//bin/pg_config

Once that is ready, getting pg_auto_failover installed is quite simple:

postgres@pgaf1:~$ git clone https://github.com/citusdata/pg_auto_failover.git
Cloning into 'pg_auto_failover'...
remote: Enumerating objects: 252, done.
remote: Counting objects: 100% (252/252), done.
remote: Compressing objects: 100% (137/137), done.
remote: Total 8131 (delta 134), reused 174 (delta 115), pack-reused 7879
Receiving objects: 100% (8131/8131), 5.07 MiB | 1.25 MiB/s, done.
Resolving deltas: 100% (6022/6022), done.
postgres@pgaf1:~$ cd pg_auto_failover/
postgres@pgaf1:~$ make
make -C src/monitor/ all
make[1]: Entering directory '/home/postgres/pg_auto_failover/src/monitor'
gcc -std=c99 -D_GNU_SOURCE -g -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -Wformat -Wall -Werror=implicit-int -Werror=implicit-function-declaration -Werror=return-type -Wno-declaration-after-statement -Wno-missing-braces  -fPIC -std=c99 -Wall -Werror -Wno-unused-parameter -Iinclude -I/u01/app/postgres/product/13/db_1/include -g -I. -I./ -I/u01/app/postgres/product/13/db_1/include/server -I/u01/app/postgres/product/13/db_1/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o metadata.o metadata.c
...
make[2]: Leaving directory '/home/postgres/pg_auto_failover/src/bin/pg_autoctl'
make[1]: Leaving directory '/home/postgres/pg_auto_failover/src/bin'
postgres@pgaf1:~$ make install
make -C src/monitor/ all
make[1]: Entering directory '/home/postgres/pg_auto_failover/src/monitor'
make[1]: Nothing to be done for 'all'.
...

This needs to be done on all hosts, of course. You will notice a new extension and new binaries in your PostgreSQL installation:

postgres@pgaf1:~$ ls /u01/app/postgres/product/13/db_1/share/extension/*pgauto*
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.0--1.1.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.0.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.1--1.2.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.2--1.3.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.3--1.4.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.4--dummy.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover--1.4.sql
/u01/app/postgres/product/13/db_1/share/extension/pgautofailover.control
postgres@pgaf1:~$ ls /u01/app/postgres/product/13/db_1/bin/*auto*
/u01/app/postgres/product/13/db_1/bin/pg_autoctl

Having that available we’ll need to initialize the pg_auto_failover monitor which is responsible for assigning roles and health-checking. We’ll do that in the first node:

postgres@pgaf1:~$ export PGDATA=/u02/pgdata/13/monitor
postgres@pgaf1:~$ export PGPORT=5433
postgres@pgaf1:~$ pg_autoctl create monitor --ssl-self-signed --hostname pgaf1.it.dbi-services.com --auth trust --run
14:45:40 13184 INFO  Using default --ssl-mode "require"
14:45:40 13184 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:45:40 13184 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:45:40 13184 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:45:40 13184 INFO  Initialising a PostgreSQL cluster at "/u02/pgdata/13/monitor"
14:45:40 13184 INFO  /u01/app/postgres/product/13/db_1/bin/pg_ctl initdb -s -D /u02/pgdata/13/monitor --option '--auth=trust'
14:45:42 13184 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/monitor/server.crt -keyout /u02/pgdata/13/monitor/server.key -subj "/CN=pgaf1.it.dbi-services.com"
14:45:42 13184 INFO  Started pg_autoctl postgres service with pid 13204
14:45:42 13184 INFO  Started pg_autoctl listener service with pid 13205
14:45:42 13204 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/monitor -v
14:45:42 13209 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/monitor -p 5433 -h *
14:45:42 13205 ERROR Connection to database failed: could not connect to server: No such file or directory
14:45:42 13205 ERROR    Is the server running locally and accepting
14:45:42 13205 ERROR    connections on Unix domain socket "/tmp/.s.PGSQL.5433"?
14:45:42 13205 ERROR Failed to connect to local Postgres database at "port=5433 dbname=postgres", see above for details
14:45:42 13205 ERROR Failed to create user "autoctl" on local postgres server
14:45:42 13184 ERROR pg_autoctl service listener exited with exit status 12
14:45:42 13184 INFO  Restarting service listener
14:45:42 13204 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/monitor" on port 5433 with pid 13209
14:45:43 13221 WARN  NOTICE:  installing required extension "btree_gist"
14:45:43 13221 INFO  Granting connection privileges on 192.168.22.0/24
14:45:43 13221 INFO  Your pg_auto_failover monitor instance is now ready on port 5433.
14:45:43 13221 INFO  Monitor has been successfully initialized.
14:45:43 13221 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service listener --pgdata /u02/pgdata/13/monitor -v
14:45:43 13221 INFO  Managing the monitor at postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require
14:45:43 13221 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/monitor/pg_autoctl.cfg"
14:45:44 13221 INFO  The version of extension "pgautofailover" is "1.4" on the monitor
14:45:44 13221 INFO  Contacting the monitor to LISTEN to its events.

This created a standard PostgreSQL cluster in the background:

postgres@pgaf1:~$ ls /u02/pgdata/13/monitor/
base              pg_dynshmem    pg_notify     pg_stat_tmp  pg_wal                         postmaster.opts
current_logfiles  pg_hba.conf    pg_replslot   pg_subtrans  pg_xact                        postmaster.pid
global            pg_ident.conf  pg_serial     pg_tblspc    postgresql.auto.conf           server.crt
log               pg_logical     pg_snapshots  pg_twophase  postgresql-auto-failover.conf  server.key
pg_commit_ts      pg_multixact   pg_stat       PG_VERSION   postgresql.conf                startup.log
postgres@pgaf1:~$ ps -ef | grep "postgres \-D"
postgres 13209 13204  0 14:45 pts/0    00:00:00 /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/monitor -p 5433 -h *

Before we can initialize the primary instance we need to get the connection string to the monitor:

postgres@pgaf1:~$ pg_autoctl show uri --monitor --pgdata /u02/pgdata/13/monitor/
postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require

Create the primary:

postgres@pgaf1:~$ pg_autoctl create postgres \
>     --hostname pgaf1.it.dbi-services.com \
>     --auth trust \
>     --ssl-self-signed \
>     --monitor 'postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require' \
>     --run
14:52:11 13354 INFO  Using default --ssl-mode "require"
14:52:11 13354 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:52:11 13354 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:52:11 13354 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:52:11 13354 INFO  Started pg_autoctl postgres service with pid 13356
14:52:11 13354 INFO  Started pg_autoctl node-active service with pid 13357
14:52:11 13356 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/PG1 -v
14:52:11 13357 INFO  Registered node 1 (pgaf1.it.dbi-services.com:5432) with name "node_1" in formation "default", group 0, state "single"
14:52:11 13357 INFO  Writing keeper state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.state"
14:52:11 13357 INFO  Writing keeper init state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.init"
14:52:11 13357 INFO  Successfully registered as "single" to the monitor.
14:52:11 13357 INFO  FSM transition from "init" to "single": Start as a single node
14:52:11 13357 INFO  Initialising postgres as a primary
14:52:11 13357 INFO  Initialising a PostgreSQL cluster at "/u02/pgdata/13/PG1"
14:52:11 13357 INFO  /u01/app/postgres/product/13/db_1/bin/pg_ctl initdb -s -D /u02/pgdata/13/PG1 --option '--auth=trust'
14:52:14 13357 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf1.it.dbi-services.com"
14:52:14 13385 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:52:14 13357 INFO  CREATE DATABASE postgres;
14:52:14 13356 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/PG1" on port 5432 with pid 13385
14:52:14 13357 INFO  The database "postgres" already exists, skipping.
14:52:14 13357 INFO  CREATE EXTENSION pg_stat_statements;
14:52:14 13357 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf1.it.dbi-services.com"
14:52:14 13357 INFO  Contents of "/u02/pgdata/13/PG1/postgresql-auto-failover.conf" have changed, overwriting
14:52:14 13357 INFO  Transition complete: current state is now "single"
14:52:14 13357 INFO  keeper has been successfully initialized.
14:52:14 13357 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service node-active --pgdata /u02/pgdata/13/PG1 -v
14:52:14 13357 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.cfg"
14:52:14 13357 INFO  pg_autoctl service is running, current state is "single"

Repeating the same command on the second node (with a different –hostname) will initialize the first replica:

postgres@pgaf2:~$ export PGDATA=/u02/pgdata/13/PG1
postgres@pgaf2:~$ export PGPORT=5432
postgres@pgaf2:~$ pg_autoctl create postgres \
>     --hostname pgaf2.it.dbi-services.com \
>     --auth trust \
>     --ssl-self-signed \
>     --monitor 'postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require' \
>     --run
14:54:09 13010 INFO  Using default --ssl-mode "require"
14:54:09 13010 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:54:09 13010 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:54:09 13010 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:54:09 13010 INFO  Started pg_autoctl postgres service with pid 13012
14:54:09 13010 INFO  Started pg_autoctl node-active service with pid 13013
14:54:09 13012 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/PG1 -v
14:54:09 13013 INFO  Registered node 2 (pgaf2.it.dbi-services.com:5432) with name "node_2" in formation "default", group 0, state "wait_standby"
14:54:09 13013 INFO  Writing keeper state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.state"
14:54:09 13013 INFO  Writing keeper init state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.init"
14:54:09 13013 INFO  Successfully registered as "wait_standby" to the monitor.
14:54:09 13013 INFO  FSM transition from "init" to "wait_standby": Start following a primary
14:54:09 13013 INFO  Transition complete: current state is now "wait_standby"
14:54:09 13013 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): single ➜ wait_primary
14:54:09 13013 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): wait_primary ➜ wait_primary
14:54:09 13013 INFO  Still waiting for the monitor to drive us to state "catchingup"
14:54:09 13013 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
14:54:09 13013 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
14:54:09 13013 INFO  Initialising PostgreSQL as a hot standby
14:54:09 13013 INFO   /u01/app/postgres/product/13/db_1/bin/pg_basebackup -w -d application_name=pgautofailover_standby_2 host=pgaf1.it.dbi-services.com port=5432 user=pgautofailover_replicator sslmode=require --pgdata /u02/pgdata/13/backup/node_2 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_2
14:54:09 13013 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
14:54:15 13013 INFO  pg_basebackup: checkpoint completed
14:54:15 13013 INFO  pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
14:54:15 13013 INFO  pg_basebackup: starting background WAL receiver
14:54:15 13013 INFO      0/23396 kB (0%), 0/1 tablespace (...ta/13/backup/node_2/backup_label)
14:54:16 13013 INFO   1752/23396 kB (7%), 0/1 tablespace (...ata/13/backup/node_2/base/1/2610)
14:54:16 13013 INFO  23406/23406 kB (100%), 0/1 tablespace (.../backup/node_2/global/pg_control)
14:54:16 13013 INFO  23406/23406 kB (100%), 1/1 tablespace                                         
14:54:16 13013 INFO  pg_basebackup:
14:54:16 13013 INFO   
14:54:16 13013 INFO  write-ahead log end point: 0/2000100
14:54:16 13013 INFO  pg_basebackup:
14:54:16 13013 INFO   
14:54:16 13013 INFO  waiting for background process to finish streaming ...
14:54:16 13013 INFO  pg_basebackup: syncing data to disk ...
14:54:17 13013 INFO  pg_basebackup: renaming backup_manifest.tmp to backup_manifest
14:54:17 13013 INFO  pg_basebackup: base backup completed
14:54:17 13013 INFO  Creating the standby signal file at "/u02/pgdata/13/PG1/standby.signal", and replication setup at "/u02/pgdata/13/PG1/postgresql-auto-failover-standby.conf"
14:54:17 13013 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf2.it.dbi-services.com"
14:54:17 13021 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:54:19 13013 INFO  PostgreSQL started on port 5432
14:54:19 13013 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
14:54:19 13013 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:54:19 13013 INFO  Transition complete: current state is now "catchingup"
14:54:20 13012 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/PG1" on port 5432 with pid 13021
14:54:20 13013 INFO  keeper has been successfully initialized.
14:54:20 13013 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service node-active --pgdata /u02/pgdata/13/PG1 -v
14:54:20 13013 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.cfg"
14:54:20 13013 INFO  pg_autoctl service is running, current state is "catchingup"
14:54:20 13013 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
14:54:20 13013 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:54:21 13013 INFO  Monitor assigned new state "secondary"
14:54:21 13013 INFO  FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm up and running, and eligible for promotion again
14:54:21 13013 INFO  Creating replication slot "pgautofailover_standby_1"
14:54:21 13013 INFO  Transition complete: current state is now "secondary"
14:54:21 13013 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): primary ➜ primary

The last lines of the output confirm, that pgaf1 is the primary cluster and pgaf2 now hosts a replica. Lets do the same on the third node:

postgres@pgaf3:~$ pg_autoctl create postgres \
>     --hostname pgaf3.it.dbi-services.com \
>     --auth trust \
>     --ssl-self-signed \
>     --monitor 'postgres://autoctl_node@pgaf1.it.dbi-services.com:5433/pg_auto_failover?sslmode=require' \
>     --run
14:57:19 12831 INFO  Using default --ssl-mode "require"
14:57:19 12831 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
14:57:19 12831 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
14:57:19 12831 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
14:57:19 12831 INFO  Started pg_autoctl postgres service with pid 12833
14:57:19 12831 INFO  Started pg_autoctl node-active service with pid 12834
14:57:19 12833 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service postgres --pgdata /u02/pgdata/13/PG1 -v
14:57:19 12834 INFO  Registered node 3 (pgaf3.it.dbi-services.com:5432) with name "node_3" in formation "default", group 0, state "wait_standby"
14:57:19 12834 INFO  Writing keeper state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.state"
14:57:19 12834 INFO  Writing keeper init state file at "/home/postgres/.local/share/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.init"
14:57:19 12834 INFO  Successfully registered as "wait_standby" to the monitor.
14:57:19 12834 INFO  FSM transition from "init" to "wait_standby": Start following a primary
14:57:19 12834 INFO  Transition complete: current state is now "wait_standby"
14:57:19 12834 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): primary ➜ join_primary
14:57:20 12834 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): join_primary ➜ join_primary
14:57:20 12834 INFO  Still waiting for the monitor to drive us to state "catchingup"
14:57:20 12834 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
14:57:20 12834 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
14:57:20 12834 INFO  Initialising PostgreSQL as a hot standby
14:57:20 12834 INFO   /u01/app/postgres/product/13/db_1/bin/pg_basebackup -w -d application_name=pgautofailover_standby_3 host=pgaf1.it.dbi-services.com port=5432 user=pgautofailover_replicator sslmode=require --pgdata /u02/pgdata/13/backup/node_3 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_3
14:57:20 12834 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
14:57:20 12834 INFO  pg_basebackup: checkpoint completed
14:57:20 12834 INFO  pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
14:57:20 12834 INFO  pg_basebackup: starting background WAL receiver
14:57:20 12834 INFO      0/23397 kB (0%), 0/1 tablespace (...ta/13/backup/node_3/backup_label)
14:57:20 12834 INFO  23406/23406 kB (100%), 0/1 tablespace (.../backup/node_3/global/pg_control)
14:57:20 12834 INFO  23406/23406 kB (100%), 1/1 tablespace                                         
14:57:20 12834 INFO  pg_basebackup: write-ahead log end point: 0/4000100
14:57:20 12834 INFO  pg_basebackup: waiting for background process to finish streaming ...
14:57:20 12834 INFO  pg_basebackup: syncing data to disk ...
14:57:22 12834 INFO  pg_basebackup: renaming backup_manifest.tmp to backup_manifest
14:57:22 12834 INFO  pg_basebackup: base backup completed
14:57:22 12834 INFO  Creating the standby signal file at "/u02/pgdata/13/PG1/standby.signal", and replication setup at "/u02/pgdata/13/PG1/postgresql-auto-failover-standby.conf"
14:57:22 12834 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf3.it.dbi-services.com"
14:57:22 12841 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:57:22 12834 INFO  PostgreSQL started on port 5432
14:57:22 12834 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
14:57:22 12834 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:57:22 12834 INFO  Ensuring HBA rules for node 2 "node_2" (pgaf2.it.dbi-services.com:5432)
14:57:22 12834 ERROR Connection to database failed: could not connect to server: No such file or directory
14:57:22 12834 ERROR    Is the server running locally and accepting
14:57:22 12834 ERROR    connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
14:57:22 12834 ERROR Failed to connect to local Postgres database at "port=5432 dbname=postgres", see above for details
14:57:22 12834 ERROR Failed to reload the postgres configuration after adding the standby user to pg_hba
14:57:22 12834 ERROR Failed to update the HBA entries for the new elements in the our formation "default" and group 0
14:57:22 12834 ERROR Failed to update HBA rules after a base backup
14:57:22 12834 ERROR Failed to transition from state "wait_standby" to state "catchingup", see above.
14:57:22 12831 ERROR pg_autoctl service node-active exited with exit status 12
14:57:22 12831 INFO  Restarting service node-active
14:57:22 12845 INFO  Continuing from a previous `pg_autoctl create` failed attempt
14:57:22 12845 INFO  PostgreSQL state at registration time was: PGDATA does not exists
14:57:22 12845 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
14:57:22 12845 INFO  Initialising PostgreSQL as a hot standby
14:57:22 12845 INFO  Target directory exists: "/u02/pgdata/13/PG1", stopping PostgreSQL
14:57:24 12833 INFO  Postgres is now serving PGDATA "/u02/pgdata/13/PG1" on port 5432 with pid 12841
14:57:24 12833 INFO  Stopping pg_autoctl postgres service
14:57:24 12833 INFO  /u01/app/postgres/product/13/db_1/bin/pg_ctl --pgdata /u02/pgdata/13/PG1 --wait stop --mode fast
14:57:24 12845 INFO   /u01/app/postgres/product/13/db_1/bin/pg_basebackup -w -d application_name=pgautofailover_standby_3 host=pgaf1.it.dbi-services.com port=5432 user=pgautofailover_replicator sslmode=require --pgdata /u02/pgdata/13/backup/node_3 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_3
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  initiating base backup, waiting for checkpoint to complete
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  checkpoint completed
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  write-ahead log start point: 0/5000028 on timeline 1
14:57:24 12845 INFO  pg_basebackup:
14:57:24 12845 INFO   
14:57:24 12845 INFO  starting background WAL receiver
14:57:24 12845 INFO      0/23397 kB (0%), 0/1 tablespace (...ta/13/backup/node_3/backup_label)
14:57:25 12845 INFO  16258/23397 kB (69%), 0/1 tablespace (...3/backup/node_3/base/12662/12512)
14:57:25 12845 INFO  23406/23406 kB (100%), 0/1 tablespace (.../backup/node_3/global/pg_control)
14:57:25 12845 INFO  23406/23406 kB (100%), 1/1 tablespace                                         
14:57:25 12845 INFO  pg_basebackup: write-ahead log end point: 0/5000100
14:57:25 12845 INFO  pg_basebackup: waiting for background process to finish streaming ...
14:57:25 12845 INFO  pg_basebackup: syncing data to disk ...
14:57:27 12845 INFO  pg_basebackup:
14:57:27 12845 INFO   
14:57:27 12845 INFO  renaming backup_manifest.tmp to backup_manifest
14:57:27 12845 INFO  pg_basebackup:
14:57:27 12845 INFO   
14:57:27 12845 INFO  base backup completed
14:57:27 12845 INFO  Creating the standby signal file at "/u02/pgdata/13/PG1/standby.signal", and replication setup at "/u02/pgdata/13/PG1/postgresql-auto-failover-standby.conf"
14:57:27 12845 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /u02/pgdata/13/PG1/server.crt -keyout /u02/pgdata/13/PG1/server.key -subj "/CN=pgaf3.it.dbi-services.com"
14:57:27 12881 INFO   /u01/app/postgres/product/13/db_1/bin/postgres -D /u02/pgdata/13/PG1 -p 5432 -h *
14:57:29 12845 INFO  PostgreSQL started on port 5432
14:57:29 12845 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
14:57:29 12845 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:57:29 12845 INFO  Ensuring HBA rules for node 2 "node_2" (pgaf2.it.dbi-services.com:5432)
14:57:29 12845 INFO  Transition complete: current state is now "catchingup"
14:57:29 12845 INFO  keeper has been successfully initialized.
14:57:29 12845 INFO   /u01/app/postgres/product/13/db_1/bin/pg_autoctl do service node-active --pgdata /u02/pgdata/13/PG1 -v
14:57:29 12845 INFO  Reloaded the new configuration from "/home/postgres/.config/pg_autoctl/u02/pgdata/13/PG1/pg_autoctl.cfg"
14:57:29 12845 INFO  pg_autoctl service is running, current state is "catchingup"
14:57:29 12845 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
14:57:29 12845 INFO  Ensuring HBA rules for node 1 "node_1" (pgaf1.it.dbi-services.com:5432)
14:57:29 12845 INFO  Ensuring HBA rules for node 2 "node_2" (pgaf2.it.dbi-services.com:5432)
14:57:29 12845 INFO  Monitor assigned new state "secondary"
14:57:29 12845 INFO  FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm up and running, and eligible for promotion again
14:57:29 12833 WARN  PostgreSQL was not running, restarted with pid 12881
14:57:29 12845 INFO  Creating replication slot "pgautofailover_standby_1"
14:57:29 12845 INFO  Creating replication slot "pgautofailover_standby_2"
14:57:29 12845 INFO  Transition complete: current state is now "secondary"
14:57:29 12845 INFO  New state for node 1 "node_1" (pgaf1.it.dbi-services.com:5432): primary ➜ primary

That really was quite simple. We now have two replicas synchronizing from the same primary:

postgres=# select usename,application_name,client_hostname,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag from pg_stat_replication ;
          usename          |     application_name     |      client_hostname      | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag 
---------------------------+--------------------------+---------------------------+-----------+-----------+-----------+------------+-----------
 pgautofailover_replicator | pgautofailover_standby_2 | pgaf2.it.dbi-services.com | 0/6000148 | 0/6000148 | 0/6000148 | 0/6000148  | 
 pgautofailover_replicator | pgautofailover_standby_3 | pgaf3.it.dbi-services.com | 0/6000148 | 0/6000148 | 0/6000148 | 0/6000148  | 
(2 rows)

If you prepare that well, it is a matter of a few minutes and a setup like this is up and runnning. For the setup part, one bit is missing: All these pg_autoctl commands did not detach from the console, but run in the foreground and everything stops if we cancel the commands or close the terminal.

Luckily pg_auto_failover comes with a handy command to create a systemd service file:

postgres@pgaf1:~$ pg_autoctl -q show systemd --pgdata /u02/pgdata/13/monitor/ > pgautofailover.service
postgres@pgaf1:~$ cat pgautofailover.service
[Unit]
Description = pg_auto_failover

[Service]
WorkingDirectory = /home/postgres
Environment = 'PGDATA=/u02/pgdata/13/monitor/'
User = postgres
ExecStart = /u01/app/postgres/product/13/db_1/bin/pg_autoctl run
Restart = always
StartLimitBurst = 0

[Install]
WantedBy = multi-user.target

This can easily be added to systemd so the monitor will start automatically:

postgres@pgaf1:~$ sudo mv pgautofailover.service /etc/systemd/system
postgres@pgaf1:~$ sudo systemctl daemon-reload
postgres@pgaf1:~$ sudo systemctl enable pgautofailover.service
Created symlink /etc/systemd/system/multi-user.target.wants/pgautofailover.service → /etc/systemd/system/pgautofailover.service.
postgres@pgaf1:~$ sudo systemctl start pgautofailover.service

From now the service will start automatically when the node boots up. Lets do the same for the PostgreSQL clusters:

postgres@pgaf1:~$ pg_autoctl -q show systemd --pgdata /u02/pgdata/13/PG1/ > postgresp1.service
postgres@pgaf1:~$ cat postgresp1.service
[Unit]
Description = pg_auto_failover

[Service]
WorkingDirectory = /home/postgres
Environment = 'PGDATA=/u02/pgdata/13/PG1/'
User = postgres
ExecStart = /u01/app/postgres/product/13/db_1/bin/pg_autoctl run
Restart = always
StartLimitBurst = 0

[Install]
WantedBy = multi-user.target
postgres@pgaf1:~$ sudo mv postgresp1.service /etc/systemd/system
postgres@pgaf1:~$ sudo systemctl daemon-reload
postgres@pgaf1:~$ sudo systemctl enable postgresp1.service
Created symlink /etc/systemd/system/multi-user.target.wants/postgresp1.service → /etc/systemd/system/postgresp1.service.
postgres@pgaf1:~$ sudo systemctl start postgresp1.service

Do the same on the remaing two nodes and reboot all systems. If all went fine pg_auto_failover and the PostgreSQL cluster will come up automatically:

postgres@pgaf1:~$ pg_autoctl show state --pgdata /u02/pgdata/13/monitor/
  Name |  Node |                      Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | pgaf1.it.dbi-services.com:5432 | 0/6002320 |       yes |             primary |             primary
node_2 |     2 | pgaf2.it.dbi-services.com:5432 | 0/6002320 |       yes |           secondary |           secondary
node_3 |     3 | pgaf3.it.dbi-services.com:5432 | 0/6002320 |       yes |           secondary |           secondary

That’s it for the first part. In the next post we’ll look at how robust pg_auto_failover is, e.g. what happens when the first node, which also runs the monitor, goes down?

Cet article Easy failover and switchover with pg_auto_failover est apparu en premier sur Blog dbi services.

Oracle 21c Security : Gradual Database Password Rollover

Thu, 2020-12-10 06:14

Starting with Oracle 21c, a password of an application can be changed without having to schedule a downtime. This can be done by using the new profile parameter PASSWORD_ROLLOVER_TIME
This will set a rollover period of time where the application can log in using either the old password or the new password. With this enhancement, an administrator does not need any more to take the application down when the application database password is being rotated.
Let see in this blog how this works

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
SQL>

First we create a profile in PDB1

SQL> show con_name;

CON_NAME
------------------------------
PDB1


SQL> CREATE PROFILE testgradualrollover LIMIT
 FAILED_LOGIN_ATTEMPTS 4
 PASSWORD_ROLLOVER_TIME 4;  

Profile created.

SQL>

Note that the parameter PASSWORD_ROLLOVER_TIME is specified in days. For example, 1/24 means 1H.
The minimum value for this parameter is 1h and the maximum value is 60 days or the lower value of the PASSWORD_LIFE_TIME or PASSWORD_GRACE_TIME parameter.
Now let’s create a new user in PDB1 and let’s assign him the profile we created

SQL> create user edge identified by "Borftg8957##"  profile testgradualrollover;

User created.

SQL> grant create session to edge;

Grant succeeded.

SQL>

We can also verify the status of the account in the PDB

SQL>  select username,account_status from dba_users where username='EDGE';

USERNAME             ACCOUNT_STATUS
-------------------- --------------------
EDGE                 OPEN

SQL>

Now let’s log with new user


[oracle@oraadserver admin]$ sqlplus edge/"Borftg8957##"@pdb1

SQL*Plus: Release 21.0.0.0.0 - Production on Thu Dec 10 11:14:07 2020
Version 21.1.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.1.0.0.0

SQL> show con_name;

CON_NAME
------------------------------
PDB1
SQL> show user;
USER is "EDGE"
SQL>

Now let’s change the password of the user edge

SQL> alter user edge identified by "Morfgt5879!!";

User altered.

SQL>

As the rollover period is set to 4 days in the profile testgradualrollover, the user edge should be able to connect during 4 days with either the old password or the new one.
Let’s test with the old password

[oracle@oraadserver admin]$ sqlplus edge/"Borftg8957##"@pdb1

SQL*Plus: Release 21.0.0.0.0 - Production on Thu Dec 10 11:21:02 2020
Version 21.1.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Last Successful login time: Thu Dec 10 2020 11:14:07 +01:00

Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.1.0.0.0

SQL> show con_name;

CON_NAME
------------------------------
PDB1
SQL> show user;
USER is "EDGE"
SQL>

Let’s test with the new password

[oracle@oraadserver ~]$ sqlplus edge/'Morfgt5879!!'@pdb1

SQL*Plus: Release 21.0.0.0.0 - Production on Thu Dec 10 11:24:52 2020
Version 21.1.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Last Successful login time: Thu Dec 10 2020 11:21:02 +01:00

Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.1.0.0.0

SQL> show user;
USER is "EDGE"

SQL> show con_name;

CON_NAME
------------------------------
PDB1
SQL>

We can see that the connection is successfully done with both cases. If we query the dba_users we can see the status of the rollover

SQL> select username,account_status from dba_users where username='EDGE';

USERNAME             ACCOUNT_STATUS
-------------------- --------------------
EDGE                 OPEN & IN ROLLOVER

To end the password rollover period
-Let the password rollover expire on its own
-As either the user or an administrator run the command

    Alter user edge expire password rollover period;

-As an administrator, expire the user password

Alter user edge password expire;

Database behavior during the gradual password rollover period can be found here in the documentation

Cet article Oracle 21c Security : Gradual Database Password Rollover est apparu en premier sur Blog dbi services.

Oracle 21c Security : ORA_STIG_PROFILE and ORA_CIS_PROFILE

Thu, 2020-12-10 06:14

In my previous blog I was testing the creation of a new Oracle 21c database. In this blog I am talking about two changes about the security.
In each new release Oracle strengthens security. That’s why since Oracle 12.2, to meet Security Technical Implementation Guides (STIG) compliance, Oracle Database provided the profile ORA_STIG_PROFILE
With Oracle 21c the profile ORA_STIG_PROFILE was updated and Oracle has provided a new profile to meet CIS standard : the profile ORA_CIS_PROFILE
The ORA_STIG_PROFILE user profile has been updated with the latest Security Technical Implementation Guide’s (STIG) guidelines
The ORA_CIS_PROFILE has the latest Center for Internet Security (CIS) guidelines

ORA_STIG_PROFILE
In an Oracle 19c database, we can fine following for the ORA_STIG_PROFILE.

SQL> select profile,resource_name,limit from dba_profiles where profile='ORA_STIG_PROFILE' order by resource_name;

PROFILE                        RESOURCE_NAME                  LIMIT
------------------------------ ------------------------------ ------------------------------
ORA_STIG_PROFILE               COMPOSITE_LIMIT                DEFAULT
ORA_STIG_PROFILE               CONNECT_TIME                   DEFAULT
ORA_STIG_PROFILE               CPU_PER_CALL                   DEFAULT
ORA_STIG_PROFILE               CPU_PER_SESSION                DEFAULT
ORA_STIG_PROFILE               FAILED_LOGIN_ATTEMPTS          3
ORA_STIG_PROFILE               IDLE_TIME                      15
ORA_STIG_PROFILE               INACTIVE_ACCOUNT_TIME          35
ORA_STIG_PROFILE               LOGICAL_READS_PER_CALL         DEFAULT
ORA_STIG_PROFILE               LOGICAL_READS_PER_SESSION      DEFAULT
ORA_STIG_PROFILE               PASSWORD_GRACE_TIME            5
ORA_STIG_PROFILE               PASSWORD_LIFE_TIME             60
ORA_STIG_PROFILE               PASSWORD_LOCK_TIME             UNLIMITED
ORA_STIG_PROFILE               PASSWORD_REUSE_MAX             10
ORA_STIG_PROFILE               PASSWORD_REUSE_TIME            365
ORA_STIG_PROFILE               PASSWORD_VERIFY_FUNCTION       ORA12C_STIG_VERIFY_FUNCTION
ORA_STIG_PROFILE               PRIVATE_SGA                    DEFAULT
ORA_STIG_PROFILE               SESSIONS_PER_USER              DEFAULT

17 rows selected.

SQL>

Now in in Oracle 21c, we can see that there are some changes.

SQL> select profile,resource_name,limit from dba_profiles where profile='ORA_STIG_PROFILE' order by RESOURCE_NAME;

PROFILE                        RESOURCE_NAME                  LIMIT
------------------------------ ------------------------------ ------------------------------
ORA_STIG_PROFILE               COMPOSITE_LIMIT                DEFAULT
ORA_STIG_PROFILE               CONNECT_TIME                   DEFAULT
ORA_STIG_PROFILE               CPU_PER_CALL                   DEFAULT
ORA_STIG_PROFILE               CPU_PER_SESSION                DEFAULT
ORA_STIG_PROFILE               FAILED_LOGIN_ATTEMPTS          3
ORA_STIG_PROFILE               IDLE_TIME                      15
ORA_STIG_PROFILE               INACTIVE_ACCOUNT_TIME          35
ORA_STIG_PROFILE               LOGICAL_READS_PER_CALL         DEFAULT
ORA_STIG_PROFILE               LOGICAL_READS_PER_SESSION      DEFAULT
ORA_STIG_PROFILE               PASSWORD_GRACE_TIME            0
ORA_STIG_PROFILE               PASSWORD_LIFE_TIME             35
ORA_STIG_PROFILE               PASSWORD_LOCK_TIME             UNLIMITED
ORA_STIG_PROFILE               PASSWORD_REUSE_MAX             5
ORA_STIG_PROFILE               PASSWORD_REUSE_TIME            175
ORA_STIG_PROFILE               PASSWORD_ROLLOVER_TIME         DEFAULT
ORA_STIG_PROFILE               PASSWORD_VERIFY_FUNCTION       ORA12C_STIG_VERIFY_FUNCTION
ORA_STIG_PROFILE               PRIVATE_SGA                    DEFAULT
ORA_STIG_PROFILE               SESSIONS_PER_USER              DEFAULT

18 rows selected.

SQL>

The following parameters were updated

-PASSWORD_GRACE_TIME
-PASSWORD_LIFE_TIME
-PASSWORD_REUSE_MAX
-PASSWORD_REUSE_TIME
-And there is a new parameter PASSWORD_ROLLOVER_TIME

ORA_CIS_PROFILE
Below the new characteristics for the new profile

SQL> select profile,resource_name,limit from dba_profiles where profile='ORA_CIS_PROFILE' order by RESOURCE_NAME;

PROFILE                        RESOURCE_NAME                  LIMIT
------------------------------ ------------------------------ ------------------------------
ORA_CIS_PROFILE                COMPOSITE_LIMIT                DEFAULT
ORA_CIS_PROFILE                CONNECT_TIME                   DEFAULT
ORA_CIS_PROFILE                CPU_PER_CALL                   DEFAULT
ORA_CIS_PROFILE                CPU_PER_SESSION                DEFAULT
ORA_CIS_PROFILE                FAILED_LOGIN_ATTEMPTS          5
ORA_CIS_PROFILE                IDLE_TIME                      DEFAULT
ORA_CIS_PROFILE                INACTIVE_ACCOUNT_TIME          120
ORA_CIS_PROFILE                LOGICAL_READS_PER_CALL         DEFAULT
ORA_CIS_PROFILE                LOGICAL_READS_PER_SESSION      DEFAULT
ORA_CIS_PROFILE                PASSWORD_GRACE_TIME            5
ORA_CIS_PROFILE                PASSWORD_LIFE_TIME             90
ORA_CIS_PROFILE                PASSWORD_LOCK_TIME             1
ORA_CIS_PROFILE                PASSWORD_REUSE_MAX             20
ORA_CIS_PROFILE                PASSWORD_REUSE_TIME            365
ORA_CIS_PROFILE                PASSWORD_ROLLOVER_TIME         DEFAULT
ORA_CIS_PROFILE                PASSWORD_VERIFY_FUNCTION       ORA12C_VERIFY_FUNCTION
ORA_CIS_PROFILE                PRIVATE_SGA                    DEFAULT
ORA_CIS_PROFILE                SESSIONS_PER_USER              10

18 rows selected.

SQL>

These user profiles can be directly used with the database users or as part of your own user profiles. Oracle keeps these profiles up to date to make it easier for you to implement password policies that meet STIG and CIS guidelines.

Cet article Oracle 21c Security : ORA_STIG_PROFILE and ORA_CIS_PROFILE est apparu en premier sur Blog dbi services.

A Simple Repository Browser Utility

Wed, 2020-12-09 13:48

A few weeks ago, as the final steps of a cloning procedure, I wanted to check if the cloned repository was OK. One of the tests was to peek and poke around in the repository and try to access its content. This is typically the kind of task for which you’d use a GUI-based program because it is much quicker and easier this way rather than by sending manually typed commands to the server from within idql and iapi and transferring the contents to a desktop where a pdf reader, word processor and spreadsheet programs can be used to visualize them. Documentum Administrator (alias DA) is the tool we generally use for this purpose. It is a browser-based java application deployed on a web application server such as Oracle WebLogic (which is overkill just for DA) or tomcat. It also requires IE as the browser because DA needs to download an executable extension for Windows in order to enable certain functionalities. So, I had to download and install the full requirements’ stack to enable DA: an openjdk (several trials before the correct one, an OpenJDK v11, was found), tomcat, DA (twice, one was apparently crippled), configure and deploy DA (with a lots of confusing date errors which could relate to the cloning process but were not, after all), start my Windows VM (all 8 Gb of RAM of it), start IE (which I never use, and you shouldn’t either), point IE to the aws instance DA was installed in, download and install the extension when prompted to do so, all this only to notice that 1. content visualization still did not work and 2. its installation did not stick as it kept asking to download and install the extension over and over again. All this DA part took twice as long as the cloning process itself. All I wanted was to browse the repository, click on a few random files here and there to see if their content was reachable, and to do that I had to install several Gb of, dare I say ?, bloatware. “This is ridiculous”, I thought, there has to be a better way. And indeed there is.
I remembered a cute little python module I use sometimes, server.py. It embarks a web server and presents a navigable web interface to the file system directory it is started from. From there, one can click on a file link and the file is opened in the browser or by the right application if it is installed and the mime file association is correct; or click on a sub-directory link to enter it. Colleagues can also use the URL to come and fetch files from my machines if needed, a quick way to share files, albeit temporarily.
 
Starting the file server in the current directory:

 
Current directory’s listing:

As it is open source, its code is available here server.py.
The file operations per se, mainly calls to the os module, were very few and thin and so I decided to gave it a try, replacing them with calls to the repository through the module DctmApi.py (see blog here DctmAPI.py). The result, after resolving a few issues due to the way Documentum repositories implement the file system metaphor, was quite effective and is presented in this blog. Enjoy.

Installing the module

As the saying goes, The shoemaker’s Son Always Goes Barefoot, so no git hub here and you’ll have to download the module’s original code from the aforementioned site, rename it to original-server.py and patch it. The changes have been kept minimal so that the resulting patch file is small and manageable.
On my Linux box, the downloaded source had extraneous empty lines, which I removed with following one-liner:

$ gawk -v RS='\n\n' '{print}' original-server.py > tmp.py; mv tmp.py original-server.py

After that, save the following patch instructions into the file delta.patch:

623a624,625
> import DctmBrowser
> 
637a640,644
>     session = None 
> 
>     import re
>     split = re.compile('(.+?)\(([0-9a-f]{16})\)')
>     last = re.compile('(.+?)\(([0-9a-f]{16})\).?$')
666,667c673,674
<         f = None
         # now path is a tuple (current path, r_object_id)
>         if DctmBrowser.isdir(SimpleHTTPRequestHandler.session, path):
678,685c685,686
<             for index in "index.html", "index.htm":
<                 index = os.path.join(path, index)
<                 if os.path.exists(index):
<                     path = index
<                     break
<             else:
<                 return self.list_directory(path)
             return self.list_directory(path)
>         f = None
687c688
             f = DctmBrowser.docopen(SimpleHTTPRequestHandler.session, path[1], 'rb')
693c694
             self.send_header("Content-type", DctmBrowser.splitext(SimpleHTTPRequestHandler.session, path[1]))
709a711
>         path is a (r_folder_path, r_object_id) tuple;
712c714
             list = DctmBrowser.listdir(SimpleHTTPRequestHandler.session, path)
718c720
         list.sort(key=lambda a: a[0].lower())
721,722c723,726
<             displaypath = urllib.parse.unquote(self.path,
             if ("/" != self.path):
>                displaypath = "".join(i[0] for i in SimpleHTTPRequestHandler.split.findall(urllib.parse.unquote(self.path, errors='surrogatepass')))
>             else:
>                displaypath = "/"
724c728
             displaypath = urllib.parse.unquote(path[0])
727c731
         title = 'Repository listing for %s' % displaypath
734c738
<         r.append('\n%s' % title)
---
>         r.append('%s\n' % title)
736,737c740,745
<         for name in list:
         # add an .. for the parent folder;
>         if ("/" != path[0]):
>             linkname = "".join(i[0] + "(" + i[1] + ")" for i in SimpleHTTPRequestHandler.split.findall(urllib.parse.unquote(self.path, errors='surrogatepass'))[:-1]) or "/"
>             r.append('%s' % (urllib.parse.quote(linkname, errors='surrogatepass'), html.escape("..")))
>         for (name, r_object_id) in list:
>             fullname = os.path.join(path[0], name)
740c748
             if DctmBrowser.isdir(SimpleHTTPRequestHandler.session, (name, r_object_id)):
742,749c750,751
<                 linkname = name + "/"
<             if os.path.islink(fullname):
<                 displayname = name + "@"
<                 # Note: a link to a directory displays with @ and links with /
<             r.append('
  • %s
  • ' < % (urllib.parse.quote(linkname, < errors='surrogatepass'), linkname = name + "(" + r_object_id + ")" + "/" > r.append('
  • %s
  • ' % (urllib.parse.quote(linkname, errors='surrogatepass'), html.escape(displayname))) 762,767c764 < """Translate a /-separated PATH to the local filename syntax. < < Components that mean special things to the local file system < (e.g. drive or directory names) are ignored. (XXX They should < probably be diagnosed.) """Extracts the path and r_object_id parts of a path formatted thusly: /....(r_object_id){/....(r_object_id)} 768a766,768 > if "/" == path: > return (path, None) > 773d772 < trailing_slash = path.rstrip().endswith('/') 781c780 path = "/" 787,789c786,787 < if trailing_slash: < path += '/' (path, r_object_id) = SimpleHTTPRequestHandler.last.findall(path)[0] > return (path, r_object_id) 807,840d804 < def guess_type(self, path): < """Guess the type of a file. < < Argument is a PATH (a filename). < < Return value is a string of the form type/subtype, < usable for a MIME Content-type header. < < The default implementation looks the file's extension < up in the table self.extensions_map, using application/octet-stream < as a default; however it would be permissible (if < slow) to look inside the data to make a better guess. < < """ < < base, ext = posixpath.splitext(path) < if ext in self.extensions_map: < return self.extensions_map[ext] < ext = ext.lower() < if ext in self.extensions_map: < return self.extensions_map[ext] < else: < return self.extensions_map[''] < < if not mimetypes.inited: < mimetypes.init() # try to read system mime.types < extensions_map = mimetypes.types_map.copy() < extensions_map.update({ < '': 'application/octet-stream', # Default < '.py': 'text/plain', < '.c': 'text/plain', < '.h': 'text/plain', < }) 1175c1140 ServerClass=HTTPServer, protocol="HTTP/1.0", port=8000, bind="", session = None): 1183a1149 > HandlerClass.session = session 1212d1177 <

    Apply the patch using the following command:

    $ patch -n original-server.py delta.patch -o server.py
    

    server.py is the patched module with the repository access operations replacing the file system access ones.
    As the command-line needs some more parameters for the connectivity to the repository, an updated main block has been added to parse them and moved into the new executable browser_repo.py. Here it is:

    import argparse
    import server
    import textwrap
    import DctmAPI
    import DctmBrowser
    
    if __name__ == '__main__':
        parser = argparse.ArgumentParser(
           formatter_class=argparse.RawDescriptionHelpFormatter,
           description = textwrap.dedent("""\
    A web page to navigate a docbase's cabinets & folders.
    Based on Aukasz Langa python server.py's module https://hg.python.org/cpython/file/3.5/Lib/http/server.py
    cec at dbi-services.com, December 2020, integration with Documentum repositories;
    """))
        parser.add_argument('--bind', '-b', default='', metavar='ADDRESS',
                            help='Specify alternate bind address [default: all interfaces]')
        parser.add_argument('--port', action='store',
                            default=8000, type=int,
                            nargs='?',
                            help='Specify alternate port [default: 8000]')
        parser.add_argument('-d', '--docbase', action='store',
                            default='dmtest73', type=str,
                            nargs='?',
                            help='repository name [default: dmtest73]')
        parser.add_argument('-u', '--user_name', action='store',
                            default='dmadmin',
                            nargs='?',
                            help='user name [default: dmadmin]')
        parser.add_argument('-p', '--password', action='store',
                            default='dmadmin',
                            nargs='?',
                            help=' user password [default: "dmadmin"]')
        args = parser.parse_args()
    
        # Documentum initialization and connecting here;
        DctmAPI.logLevel = 1
    
        # not really needed as it is done in the module itself;
        status = DctmAPI.dmInit()
        if status:
           print("dmInit() was successful")
        else:
           print("dmInit() was not successful, exiting ...")
           sys.exit(1)
    
        session = DctmAPI.connect(args.docbase, args.user_name, args.password)
        if session is None:
           print("no session opened in docbase %s as user %s, exiting ..." % (args.docbase, args.user_name))
           exit(1)
    
        try:
           server.test(HandlerClass=server.SimpleHTTPRequestHandler, port=args.port, bind=args.bind, session = session)
        finally:
           print("disconnecting from repository")
           DctmAPI.disconnect(session)
    

    Save it into file browser_repo.py. This is the new main program.
    Finally, helper functions have been added to interface the main program to the module DctmAPI:

    #
    # new help functions for browser_repo.py;
    #
    
    import DctmAPI
    
    def isdir(session, path):
       """
       return True if path is a folder, False otherwise;
       path is a tuple (r_folder_path, r_object_id);
       """
       if "/" == path[0]:
          return True
       else:
          id = DctmAPI.dmAPIGet("retrieve, " + session + ",dm_folder where r_object_id = '" + path[1] + "'")
       return id
    
    def listdir(session, path):
       """
       return a tuple of objects, folders or documents with their r_object_id, in folder path[0];
       path is a tuple (r_folder_path, r_object_id);
       """
       result = []
       if path[0] in ("/", ""):
          DctmAPI.select2dict(session, "select object_name, r_object_id from dm_cabinet", result)
       else:
          DctmAPI.select2dict(session, "select object_name, r_object_id from dm_document where folder(ID('" + path[1] + "')) UNION select object_name, r_object_id from dm_folder where folder(ID('" + path[1] + "'))", result)
       return [[doc["object_name"], doc["r_object_id"]] for doc in result]
    
    def docopen(session, r_object_id, mode):
       """
       returns a file handle on the document with id r_object_id downloaded from its repository to the temporary location and opened;
       """
       temp_storage = '/tmp/'
       if DctmAPI.dmAPIGet("getfile," + session + "," + r_object_id + "," + temp_storage + r_object_id):
          return open(temp_storage + r_object_id, mode)
       else:
          raise OSError
    
    def splitext(session, r_object_id):
       """
       returns the mime type as defined in dm_format for the document with id r_object_id;
       """
       result = []
       DctmAPI.select2dict(session, "select mime_type from dm_format where r_object_id in (select format from dmr_content c, dm_document d where any c.parent_id = d.r_object_id and d.r_object_id = '" + r_object_id + "')", result)
       return result[0]["mime_type"] if result else ""
    

    Save this code into the file DctmBrowser.py.
    To summarize, we have:
    1. the original module original_server.py to be downloaded from the web
    2. delta.patch, the diff file used to patch original_server.py into file server.py
    3. DctmAPI.py, the python interface to Documentum, to be fetched from the provided link to a past blog
    4. helper functions in module DctmBrowser.py
    5. and finally the main executable browser_repo.py
    Admittedly, a git repository would be nice here, maybe one day …
    Use the command below to get the program’s help screen:

    $ python browser_repo.py --help                        
    usage: browser_repo.py [-h] [--bind ADDRESS] [--port [PORT]] [-d [DOCBASE]]
                          [-u [USER_NAME]] [-p [PASSWORD]]
    
    A web page to navigate a docbase's cabinets & folders.
    Based on Aukasz Langa python server.py's module https://hg.python.org/cpython/file/3.5/Lib/http/server.py
    cec at dbi-services.com, December 2020, integration with Documentum repositories;
    
    optional arguments:
      -h, --help            show this help message and exit
      --bind ADDRESS, -b ADDRESS
                            Specify alternate bind address [default: all
                            interfaces]
      --port [PORT]         Specify alternate port [default: 8000]
      -d [DOCBASE], --docbase [DOCBASE]
                            repository name [default: dmtest73]
      -u [USER_NAME], --user_name [USER_NAME]
                            user name [default: dmadmin]
      -p [PASSWORD], --password [PASSWORD]
                            user password [default: "dmadmin"]
    

    Thus, the command below will launch the server on port 9000 with a session opened in repository dmtest73 as user dmadmin with password dmadmin:

    $ python browse_repo.py --port 9000 -d dmtest73 -u dmadmin -p dmadmin 
    

    If you prefer long name options, use the alternative below:

    $ python browser_repo.py --port 9000 --docbase dmtest73 --user_name dmadmin --password dmadmin 
    

    Start your favorite browser, any browser, just as God intended it in the first place, and point it to the host where you started the program with the specified port, e.g. http://192.168.56.10:9000/:

    You are gratified with a very spartan, yet effective, view on the repository’s cabinets. Congratulations, you did it !

    Moving around in the repository

    As there is no root directory in a repository, the empty path or “/” are interpreted as a request to display a list of all the cabinets; each cabinet is a directory’s tree root. The program displays dm_folders and dm_cabinets (which are sub-types of dm_folder after all), and dm_document. Folders have a trailing slash to identify them, whereas documents have none. There are many other objects in repositories’ folders and I chose not to display them because I did not need to but this can be changed on lines 25 and 27 in the helper module DctmBrowser.py by specifying a different doctype, e.g. the super-type dm_sysobject instead.
    An addition to the original server module is the .. link to the parent folder; I think it is easier to use it rather than the browser’s back button or right click/back arrow, but those are still usable since the program is stateless. Actually, a starting page could even be specified manually in the starting URL if it weren’t for its unusual format. In effect, the folders components and documents’ full path in URLs and html links are suffixed with a parenthesized r_object_id, e.g.:

    http://192.168.56.10:9000/System(0c00c35080000106)/Sysadmin(0b00c3508000034e)/Reports(0b00c35080000350)/
    -- or, url-encoded:
    http://192.168.56.10:9000/System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/Reports%280b00c35080000350%29/
    

    This looks ugly but it allows to solve 2 issues specific to repositories:
    1. Document names are not unique in the same folder but are on the par with any other document’s attribute. Consequently, a folder can quietly contains hundreds of identically named documents without any name conflict. In effect, what tells two documents apart is their unique r_object_id attribute and that is the reason why it is appended to the links and URLs. This is not a big deal because this potentially annoying technical information is not displayed in the web page but is only visible while hovering over links and in the browser’s address bar.
    2. Document names can contain any character, even “/” and “:”. So, given a document’s full path name, how to parse it and separate the parent folder from the document’s name so it can be reached ? There is no generic, unambiguous way to do that. With the appended document’s unique r_object_id, it is a simple matter to extract the id from the full path and Bob’s your uncle (RIP Jerry P.).
    Both above specificities make it impossible to access a document through its full path name, therefore the documents’ ids must be carried around; for folder, it is not necessary but it has been done in order to have an uniform format. As a side-effect, database performance is also possibly better.
    If the program is started with no stdout redirection, log messages are visible on the screen, e.g.:

    dmadmin@dmclient:~/dctm-webserver$ python browser_repo.py --port 9000 --docbase dmtest73 --user_name dmadmin --password dmadmin 
    dmInit() was successful
    Serving HTTP on 0.0.0.0 port 9000 ...
    192.168.56.1 - - [05/Dec/2020 22:57:00] "GET / HTTP/1.1" 200 -
    192.168.56.1 - - [05/Dec/2020 22:57:03] "GET /System%280c00c35080000106%29/ HTTP/1.1" 200 -
    192.168.56.1 - - [05/Dec/2020 22:57:07] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/ HTTP/1.1" 200 -
    192.168.56.1 - - [05/Dec/2020 22:57:09] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/Reports%280b00c35080000350%29/ HTTP/1.1" 200 -
    192.168.56.1 - - [05/Dec/2020 22:57:14] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/Reports%280b00c35080000350%29/ConsistencyChecker%280900c3508000211e%29/ HTTP/1.1" 200 -
    192.168.56.1 - - [05/Dec/2020 22:57:22] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/Reports%280b00c35080000350%29/StateOfDocbase%280900c35080002950%29/ HTTP/1.1" 200 -
    192.168.56.1 - - [05/Dec/2020 22:57:27] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/ HTTP/1.1" 200 -
    ...
    

    The logged information and format are quite standard for web servers, one log line per request, beginning with the client’s ip address, the timestamp, request type (there will be only GETs as the utility is read-only) and resource, and the returned http status code.
    If the variable DctmAPI.logLevel is set to True (or 1 or an non-empty string or collection, as python interprets them all as the boolean True) in the main program, API statements and messages from the repository are logged to stdout too, which can help if troubleshooting is needed, e.g.:

    dmadmin@dmclient:~/dctm-webserver$ python browser_repo.py --port 9000 --docbase dmtest73 --user_name dmadmin --password dmadmin 
    'in dmInit()' 
    "dm= after loading library libdmcl.so" 
    'exiting dmInit()' 
    dmInit() was successful
    'in connect(), docbase = dmtest73, user_name = dmadmin, password = dmadmin' 
    'successful session s0' 
    '[DM_SESSION_I_SESSION_START]info:  "Session 0100c35080002e3d started for user dmadmin."' 
    'exiting connect()' 
    Serving HTTP on 0.0.0.0 port 9000 ...
    'in select2dict(), dql_stmt=select object_name, r_object_id from dm_cabinet' 
    192.168.56.1 - - [05/Dec/2020 23:02:59] "GET / HTTP/1.1" 200 -
    "in select2dict(), dql_stmt=select object_name, r_object_id from dm_document where folder(ID('0c00c35080000106')) UNION select object_name, r_object_id from dm_folder where folder(ID('0c00c35080000106'))" 
    192.168.56.1 - - [05/Dec/2020 23:03:03] "GET /System%280c00c35080000106%29/ HTTP/1.1" 200 -
    "in select2dict(), dql_stmt=select object_name, r_object_id from dm_document where folder(ID('0b00c3508000034e')) UNION select object_name, r_object_id from dm_folder where folder(ID('0b00c3508000034e'))" 
    192.168.56.1 - - [05/Dec/2020 23:03:05] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/ HTTP/1.1" 200 -
    "in select2dict(), dql_stmt=select object_name, r_object_id from dm_document where folder(ID('0b00c35080000350')) UNION select object_name, r_object_id from dm_folder where folder(ID('0b00c35080000350'))" 
    192.168.56.1 - - [05/Dec/2020 23:03:10] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/Reports%280b00c35080000350%29/ HTTP/1.1" 200 -
    "in select2dict(), dql_stmt=select mime_type from dm_format where r_object_id in (select format from dmr_content c, dm_document d where any c.parent_id = d.r_object_id and d.r_object_id = '0900c3508000211e')" 
    192.168.56.1 - - [05/Dec/2020 23:03:11] "GET /System%280c00c35080000106%29/Sysadmin%280b00c3508000034e%29/Reports%280b00c35080000350%29/ConsistencyChecker%280900c3508000211e%29/ HTTP/1.1" 200 -
    

    Feel free to initialize that variable from the command-line if you prefer.
    A nice touch in the original module is that execution errors are trapped in an exception handler so the program does not need to be restarted in case of failure. As it is stateless, errors have no effect on subsequent requests.
    Several views on the same repositories can be obtained by starting several instances of the program at once with different listening ports. Similarly, if one feels the urge to explore several repositories at once, just start as many modules as needed with different listening ports and appropriate credentials.
    To exit the program, just type ctrl-c; no data will be lost here as the program just browses repositories in read-only mode.

    A few comments on the customizations

    Lines 8 and 9 in the diff above introduce the regular expressions that will be used later to extract the path component/r_object_id couples from the URL’s path part; “split” is for one such tuple anywhere in the path and “last” is for the last one and is aimed at getting the r_object_id of the folder that is clicked on from its full path name. python’s re module allows to pre-compile them for efficiency. Note the .+? syntax to specify a non-greedy regular expression.
    On line 13, the function isdir() is now implemented in the module DctmBrowser and returns True if the clicked item is a folder.
    Similarly, line 25 calls a reimplementation of os.open() in module DctmBrowser that exports locally the clicked document’s content to /tmp, opens it and returns the file handle; this will allow the content to be sent to the browser for visualization.
    Line 31 calls a reimplementation of os.listdir() to list the content of the clicked repository folder.
    Line 37 applies the “split” regular expression to the current folder path to extract its tuple components (returned in an array of sub-path/r_object_id couples) and then concatenating the sub-paths together to get the current folder to be displayed later. More concretely, it allows to go from
    /System(0c00c35080000106)/Sysadmin(0b00c3508000034e)/Reports(0b00c35080000350)/
    to
    /System/Sysadmin/Reports
    which is displayed in the html page’s title.
    The conciseness of the expression passed to the join() is admirable; lots of programming mistakes and low-level verbosity is prevented thanks to python’s list comprehensions.
    Similarly, on line 52, the current folder’s parent folder is computed from the current path.
    On line 86, the second regular expression, “last”, is applied to extract the r_object_id of the current folder (i.e. the one that is clicked on).
    Line 89 to 121 were removed from the original module because mime processing is much simplified as the repository maintains a list of mime formats (table dm_format) and the selected document’s mime type can be found by just looking up that table, see function splitext() in module DctmBrowser, called on line 27. By returning to it a valid mime type, the browser can cleverly process the content, i.e. display the supported content types (such as text) and prompt for some other action if not (e.g. office documents).
    One line 126, the session id is passed to class SimpleHTTPRequestHandler and stored as a class variable; later it is referenced as SimpleHTTPRequestHandler.session in the class but self.session would work too, although I prefer the former syntax as it makes clear that session does not depend on the instantiations of the class; the session is valid for any such instantiations. As the program connects to only one repository at startup time, no need to make session an instance variable.
    The module DctmBrowser is used as a bridge between the module DctmAPI and the main program browser_repo.py. This is were most of the repository stuff is done. As it is blatant here, not much is needed to go from listing directories and files from a filesystem to listing folders and documents from a repository.

    Security

    As showed by the usage message above (option ––help), a bind address can be specified. By default, the embedded web server listens on all the machine’s network interfaces and, as there is not identification against the web server, another machine on the same network could reach the web server on that machine and access the repository through the opened session, if there is no firewall in the way. To prevent this, just specify the loopback IP adress, 127.0.0.1 or localhost:

    dmadmin@dmclient:~/dctm-webserver$ python browser_repo.py --bind 127.0.0.1 --port 9000 --docbase dmtest73 --user_name dmadmin --password dmadmin 
    ...
    Serving HTTP on 127.0.0.1 port 9000 ...
    
    # testing locally (no GUI on server, using wget):
    dmadmin@dmclient:~/dctm-webserver$ wget 127.0.0.1:9000
    --2020-12-05 22:06:02--  http://127.0.0.1:9000/
    Connecting to 127.0.0.1:9000... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 831 
    Saving to: 'index.html'
    
    index.html                                           100%[=====================================================================================================================>]     831  --.-KB/s    in 0s      
    
    2020-12-05 22:06:03 (7.34 MB/s) - 'index.html' saved [831/831]
    
    dmadmin@dmclient:~/dctm-webserver$ cat index.html 
    Repository listing for /
    
    Repository listing for /
    
    
    
    
    

    In addition, as the web server carries the client’s IP address (variable self.address_string), some more finely tuned address restriction could also be implemented by filtering out unwelcome clients and letting in authorized ones.
    Presently, the original module does not support https and hence the network traffic between clients and server is left unencrypted. However, one could imagine to install a small nginx or apache web server as a front on the same machine, setup security at their level and insert a redirection to the python module listening on localhost with the http protocol, a quick and easy solution that does not required any change in the code, although that would be way out of scope of the module, whose primary goal is to serve requests from the same machine it is running on. Note that if we’re starting talking about adding another web server, we could as well move all the repository browsing code into a separate (Fast)CGI python program directly invoked by the web server and make it available to any allowed networked users as a full blown service complete with authentication and access rights.

    Conclusion

    This tool is really a nice utility for browsing repositories, especially those running a Unix/linux machines because most of the time the servers are headless and have no GUI applications installed. The tool interfaces any browser, running on any O/S or device, with such repositories and alleviate the usual burden of executing getfile API statements and scp commands to transfer the contents to the desktop for visualization. For this precise functionality, it is even better than dqman, at least for browsing and visualizing browser-readable contents.
    There is a lot of room for improvement if one would like a full repository browser, e.g. to display the metadata as well. In addition, if needed, the original module’s functionality, browsing the local sub-directory tree, could be reestablished as it is not incompatible with repositories.
    The tool also proves again that the approach of picking an existing tool that implements most of the requirements, and customizing it to a specific need is quite an very effective one.

    Cet article A Simple Repository Browser Utility est apparu en premier sur Blog dbi services.

    Pages