Feed aggregator

Best Practice: Does the SQL Mantra simplify Data Verification?

Tom Kyte - 3 hours 54 min ago
Tom-- I subscribe to your SQL mantra for data (and truly, is there anything else? :)) which is: 1. Do it in a single SQL statement if at all possible. 2. If you cannot, then do it in PL/SQL (as little PL/SQL as possible!). 3. If you cannot do it...
Categories: DBA Blogs

Loading records in separate rows using sqlloader

Tom Kyte - 3 hours 54 min ago
Hi, First of all thanks a lot for answering my previous questions, that helped me a lot. Now I came up with a new questions regarding SQL loader. So, basically I have a file that contains a records like below A|B|C|D| Now, I have a requirem...
Categories: DBA Blogs

Switch objects to add where clause

Tom Kyte - 3 hours 54 min ago
Hi , We have a process that cannot be changed and that executes a query over a specific table. Due to a new development it is necessary to had a new statment to the WHERE clause of the query. But the executable cannot be changed. So the idea w...
Categories: DBA Blogs

Transaction set consistency

Tom Kyte - 3 hours 54 min ago
hello , i am reading <<Database Concepts>> , i can not understand "Transaction set consistency" clearly , will you explain this to me pls ? can you show me some simple examples to show what is "Transaction set consistency" ? thanks . the...
Categories: DBA Blogs

Quarterly EBS Upgrade Recommendations: Jan 2018 Edition

Steven Chan - 11 hours 45 min ago

We've previously provided advice on the general priorities for applying EBS updates and creating a comprehensive maintenance strategy.   

Here are our latest upgrade recommendations for E-Business Suite updates and technology stack components.  These quarterly recommendations are based upon the latest updates to Oracle's product strategies, latest support timelines, and newly-certified releases

You can research these yourself using this Note:

Upgrade Recommendations for January 2018

  EBS 12.2  EBS 12.1  EBS 12.0  EBS 11.5.10 Check your EBS support status and patching baseline

Apply the minimum 12.2 patching baseline
(EBS 12.2.3 + latest technology stack updates listed below)

In Premier Support to September 30, 2023

Apply the minimum 12.1 patching baseline
(12.1.3 Family Packs for products in use + latest technology stack updates listed below)

In Premier Support to December 31, 2021

In Sustaining Support. No new patches available.

Upgrade to 12.1.3 or 12.2

Before upgrading, 12.0 users should be on the minimum 12.0 patching baseline

In Sustaining Support. No new patches available.

Upgrade to 12.1.3 or 12.2

Before upgrading, 11i users should be on the minimum 11i patching baseline

Apply the latest EBS suite-wide RPC or RUP

12.2.7
Sept. 2017

12.1.3 RPC5
Aug. 2016

12.0.6

11.5.10.2
Use the latest Rapid Install

StartCD 51
Feb. 2016

StartCD 13
Aug. 2011

12.0.6


11.5.10.2

Apply the latest EBS technology stack, tools, and libraries

AD/TXK Delta 10
Sept. 2017

FND
Apr. 2017

EBS 12.2.6 OAF Update 8
Dec. 2017

EBS 12.2.5 OAF Update 18
Dec. 2017

EBS 12.2.4 OAF Update 18
Dec. 2017

ETCC
Oct. 2017

Web Tier Utilities 11.1.1.9

Daylight Savings Time DSTv28
Nov. 2016

Upgrade to JDK 7

FND
Apr. 2017

OAF Bundle 5
Jun. 2016

JTT Update 4
Oct. 2016

Daylight Savings Time DSTv28
Nov. 2016

Upgrade to JDK 7

 

 

Apply the latest security updates

Jan. 2018 Critical Patch Update

SHA-2 PKI Certificates

SHA-2 Update for Web ADI & Report Manager

Migrate from SSL or TLS 1.0 to TLS 1.2

Sign JAR files

Jan. 2018 Critical Patch Update

SHA-2 PKI Certificates

SHA-2 Update for Web ADI & Report Manager

Migrate from SSL or TLS 1.0 to TLS 1.2

Sign JAR files

Oct. 2015 Critical Patch Update April 2016 Critical Patch Update Use the latest certified desktop components

Use the latest JRE 1.8, 1.7, or 1.6 release that meets your requirements.

Switch to Java Web Start

Upgrade to IE 11

Upgrade to Firefox ESR 52

Upgrade Office 2003 and Office 2007 to later Office versions (e.g. Office 2016)

Upgrade Windows XP and Vista and Win 10v1507 to later versions (e.g. Windows 10v1607)

Use the latest JRE 1.8, 1.7, or 1.6 release that meets your requirements

Switch to Java Web Start

Upgrade to IE 11

Upgrade to Firefox ESR 52

Upgrade Office 2003 and Office 2007 to later Office versions (e.g. Office 2016)

Upgrade Windows XP and Vista and Win 10v1507 to later versions (e.g. Windows 10v1607)

    Upgrade to the latest database Database 11.2.0.4 or 12.1.0.2 Database 11.2.0.4 or 12.1.0.2 Database 11.2.0.4 or 12.1.0.2 Database 11.2.0.4 or 12.1.0.2 If you're using Oracle Identity Management

Upgrade to Oracle Access Manager 11.1.2.3

Upgrade to Oracle Internet Directory 11.1.1.9

Migrate from Oracle SSO to OAM 11.1.2.3

Upgrade to Oracle Internet Directory 11.1.1.9

    If you're using Oracle Discoverer

Migrate to Oracle
Business Intelligence Enterprise Edition (OBIEE), Oracle Business
Intelligence Applications (OBIA).

Discoverer 11.1.1.7 is in Sustaining Support as of June 2017

Migrate to Oracle
Business Intelligence Enterprise Edition (OBIEE), Oracle Business
Intelligence Applications (OBIA).

Discoverer 11.1.1.7 is in Sustaining Support as of June 2017

    If you're using Oracle Portal Migrate to Oracle WebCenter  11.1.1.9 Migrate to Oracle WebCenter 11.1.1.9 or upgrade to Portal 11.1.1.6 (End of Life Jun. 2017).

 

 
Categories: APPS Blogs

Nested MVs

Jonathan Lewis - 14 hours 56 min ago

A recent client was seeing a very large redo penalty from refreshing materialized views. Unfortunately they had to be refreshed very frequently, and were being handled with a complete refresh in atomic mode – which means delete every row from every MV then re-insert every row.  The total redo was running at about 5GB per hour, which wasn’t a problem for throughput, but the space for handling backup and recovery was getting a bit extreme.

The requirement consisted of two MVs which extracted and aggregated row and column subsets in two different ways from a single table; then two MVs that aggregated one of the first MVs in two different ways; then two MVs which each joined one of the first level MVs to one of the scond level MVs.

No problem – join MVs are legal, aggregate MVs are legal, “nested” MVs are legal: all you have to do is create the right MV logs and pick the right refresh command.  Since the client was also running Standard Editions (SE2) there was no need to worry about how to ensure that query rewrite would work (feature not implemented on SE).

So here, simplified and camouflaged, is a minimum subset of just the first few stages of the construction: a base table with MV log, one first-level aggregate MV with its own MV log, and two aggregate MVs based on the first MV.

drop materialized view log on req_line;
drop materialized view log on jpl_req_group_numlines;

drop materialized view jpl_req_group_numlines;
drop materialized view jpl_req_numsel;
drop materialized view jpl_req_basis;

drop table req_line;

-- ----------
-- Base Table
-- ----------

create table req_line(
        eventid         number(10,0),
        selected        number(10,0),
        req             number(10,0),
        basis           number(10,0),
        lnid            number(10,0),
        area            varchar2(10),
        excess          number(10,0),
        available       number(10,0),
        kk_id           number(10,0),
        eventdate       number(10,0),
        rs_id           number(10,0)
)
;

-- --------------------
-- MV log on base table
-- --------------------

create materialized view log 
on
req_line
with rowid(
        req, basis, lnid, eventid, selected, area,
        excess, available, kk_id, eventdate, rs_id
)
including new values
;

-- --------------------
-- Level 1 aggregate MV
-- --------------------

create materialized view jpl_req_group_numlines(
        eventid, selected, 
        row_ct, req_ct, basis_ct, req, basis, 
        maxlnid, excess, numsel, area, available, kk_id, 
        rs_id, eventdate
)
segment creation immediate
build immediate
refresh fast on demand 
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(lnid)       maxlnid,
        excess,
        count(selected) numsel,
        area,
        available,
        kk_id,
        rs_id,
        eventdate
from 
        req_line
group by 
        eventid, selected, area, excess,
        available, kk_id, eventdate, rs_id
;

-- ------------------------
-- MV log on first level MV
-- ------------------------

create materialized view log 
on
jpl_req_group_numlines
with rowid 
(
        eventid, area, selected, available,
        basis, req, maxlnid, numsel
)
including new values
;


-- ----------------------------
-- First "level 2" aggregate MV
-- ----------------------------

create materialized view jpl_req_numsel(
        eventid, selected, 
        row_ct, totalreq_ct, totalbasis_ct, totalreq, totalbasis, 
        maxlnid, numsel_ct, numsel, area
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel_ct,
        sum(numsel)     numsel,
        area
from 
        jpl_req_group_numlines
group by 
        eventid, selected, area
;


-- -----------------------------
-- Second "level 2" aggregate MV
-- -----------------------------

create materialized view jpl_req_basis(
        eventid, 
        row_ct, totalbasis_ct, totalreq_ct, totalbasis, totalreq, 
        area, selected, available, maxlnid ,
        numsel_ct, numsel
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        count(*)        row_ct,
        count(basis)    totalbasis_ct,
        count(req)      totalreq_ct,
        sum(basis)      totalbasis,
        sum(req)        totalreq,
        area,
        selected,
        available,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel,
        sum(numsel)     numsel
from
        jpl_req_group_numlines
group by 
        eventid, area, available, selected
;

Once the table, MV logs and MVs exist we can insert some data into the base table, then try refreshing the views. I have tried three different calls to the dbms_refresh package, dbms_mview.refresh_all_mviews(), dbms_mview.refresh_dependent(), and dbms_mview.refresh(), specifying the ‘F’ (fast) refresh method, atomic refresh, and nested. All three fail in the same way on 12.2.0.1. The code below shows only the refresh_dependent() call.

I’ve included a query to report the current state of the materialized views before and after the calls, and set a two second sleep before the refresh so that changes in “last refresh” time will appear. The final queries are just to check that the expected volume of data has been transferred to the materialized views.


-- ------------------------------------
-- Insert some data into the base table
-- ------------------------------------

begin
        for i in 1..100 loop
                execute immediate 'insert into req_line values( :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx)' 
                using i,i,i,i,i,i,i,i,i,i,i;
                commit;
        end loop;
end;
/

set linesize 144
column mview_name format a40

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
ORDER by
        last_refresh_date, mview_name
;

prompt  Waiting for 2 seconds to allow refresh time to change

execute dbms_lock.sleep(2)

declare
        m_fail_ct       number(6,0);
begin
        dbms_mview.refresh_dependent(
                number_of_failures      => m_fail_ct,
                list                    => 'req_line',
                method                  => 'F',
                nested                  => true,
                atomic_refresh          => true
        );

        dbms_output.put_line('Failures: ' || m_fail_ct);
end;
/

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
order by
        last_refresh_date, mview_name
;

-- --------------------------------
-- Should be 100 rows in each table
-- --------------------------------

select count(*) from jpl_req_basis;
select count(*) from jpl_req_group_numlines;
select count(*) from jpl_req_numsel;

Both the earlier versions of Oracle are happy with this code and refresh all three materialized view without fail. Oracle 12.2.0.1 crashes the procedure call with a deadlock error which, when traced, shows itself to be a self-deadlock while attempting to select a data dictionary row for update:


MVIEW_NAME                               STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_BASIS                            FRESH		     VALID		 COMPLETE 19-jan 14:03:01
JPL_REQ_GROUP_NUMLINES			 NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_NUMSEL                           FRESH		     VALID		 COMPLETE 19-jan 14:03:01

3 rows selected.

Waiting for 2 seconds to allow refresh time to change

PL/SQL procedure successfully completed.

declare
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2952
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 85
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 245
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 1243
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2414
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2908
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3699
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3723
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 75
ORA-06512: at line 4


MVIEW_NAME				 STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_NUMSEL                           NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_BASIS                            FRESH		     VALID		 FAST	  19-jan 14:03:04
JPL_REQ_GROUP_NUMLINES                   FRESH		     VALID		 FAST	  19-jan 14:03:04

The deadlock graph from the trace file, with a little extra surrounding information, looks like this:


Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00020009-00000C78-A9B090F8-00000000         26      14     X        40306      26      14           X  40306


*** 2018-01-19T14:18:03.925859+00:00 (ORCL(3))
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=2vnzfjzg6px33) -----
select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400,  flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq  from sys.mlog$ where mowner = :1 and master = :2 for update
----- PL/SQL Stack -----

So far I haven’t been able to spot whether or not I’m doing something wrong, or prohibited, and I haven’t been able to find a matching problem on MoS. Since the code works on 11gR2 and 12cR1 I’m inclined to believe it’s a bug introduced in the 12cR2 timeline – which is a nuisance for my client, but if it is a bug then perhaps a fix will appear fairly promptly.

Real-time Sailing Yacht Performance - Getting Started (Part 1)

Rittman Mead Consulting - 19 hours 46 min ago

In this series of articles, I intend to look at collecting and analysing our yacht’s data. I aim to show how a number of technologies can be used to achieve this and the thought processes around the build and exploration of the data. Ultimately, I want to improve our sailing performance with data, not a new concept for professional teams but well I have a limited amount of hardware and funds, unlike Oracle it seems, time for a bit of DIY!

In this article, I introduce some concepts and terms then I'll start cleaning and exploring the data.

Background

I have owned a Sigma 400 sailing yacht for over twelve years and she is used primarily for cruising but every now and then we do a bit of offshore racing.

In the last few years we have moved from paper charts and a very much manual way of life to electronic charts and IOS apps for navigation.

In 2017 we started to use weather modelling software to predict the most optimal route of a passage taking wind, tide and estimated boat performance (polars) into consideration.

The predicted routes are driven in part by a boat's polars, the original "polars" are a set of theoretical calculations created by the boat’s designer indicating/defining what the boat should do at each wind speed and angle of sailing. Polars give us a plot of the boat's speed given a true wind speed and angle. This in turn informs us of the optimal speed the boat could achieve at any particular angle to wind and wind speed (not taking into consideration helming accuracy, sea state, condition of sails and sail trim - It may be possible for me to derive different polars for different weather conditions). Fundamentally, polars will also give us an indication of the most optimal angle to wind to get to our destination (velocity made good).

The polars we use at the moment are based on a similar boat to the Sigma 400 but are really a best guess. I want our polars to be more accurate. I would also like to start tracking the boats performance real-time and post passage for further analysis.

The purpose of this blog is to use our boats instrument data to create accurate polars for a number of conditions and get a better understanding of our boats performance at each point of sail. I would also see what can be achieved with the AIS data. I intend to use Python to check and decode the data. I will look at a number of tools to store, buffer, visualise and analyse the outputs.

So let’s look at the technology on-board.

Instrumentation Architecture

The instruments are by Raymarine. We have a wind vane, GPS, speed sensor, depth sounder and sea temperature gauge, electronic compass, gyroscope, and rudder angle reader. These are all fed into a central course computer. Some of the instrument displays share and enrich the data calculating such things as apparent wind angles as an example. All the data travels through a proprietary Raymarine messaging system called SeaTalk. To allow Raymarine instruments to interact with other instrumentation there is an NMEA-0183 port. NMEA-0183 is a well-known communication protocol and is fairly well documented so this is the data I need to extract from the system. I currently have an NMEA-0183 cable connecting the Raymarine instruments to an AIS transponder. The AIS transponder includes a Wireless router. The wireless router enables me to connect portable devices to the instrumentation.

The first task is to start looking at the data and get an understanding of what needs to be done before I can start analysing.

Analysing the data

There is a USB connection from the AIS hub however the instructions do warn that this should only be used during installation. I did spool some data from the USB port, it seemed to work OK. I could connect directly to the NMEA-0183 output however that would require me to do some wiring so will look at that if the reliability of the wireless causes issues. The second option was to use the wireless connection. I start by spooling the data to a log file using nc (nc is basically OSX's version of netcat, a TCP and UDP tool).

Spooling the data to a log file

nc  -p 1234 192.168.1.1 2000 > instrument.log

The spooled data gave me a clear indication that there would need to be some sanity checking of the data before it would be useful. The data is split into a number of different message types each consisting of a different structure. I will convert these messages into a JSON format so that the messages are more readable downstream. In the example below the timestamps displayed are attached using awk but my Python script will handle any enrichment as I build out.

The data is comma separated so this makes things easy and there a number of good websites that describe the contents of the messages. Looking at the data using a series of awk commands I clearly identify three main types of messages. GPS, AIS and Integrated instrument messages. Each message ends in a two-digit hex code this can be XOR'd to validate the message.

Looking at an example wind messages

We get two messages related to the wind true and apparent the data is the same because the boat was stationary.

$IIMWV,180.0,R,3.7,N,A*30
$IIMWV,180.0,T,3.8,N,A*30

These are Integrated Instrument Mast Wind Vain (IIMWV) * I have made an assumption about the meaning of M so if you are an expert in these messages feel free to correct me ;-)*

These messages break down to:

  1. $IIMWV II Talker, MWV Sentence
  2. 180.0 Wind Angle 0 - 359
  3. R Relative (T = True)
  4. 3.7 Wind Speed
  5. N Wind Speed Units Knots (N = KPH, M = MPH)
  6. A Status (A= Valid)
  7. *30 Checksums

And in English (ish)

180.0 Degrees Relative wind speed 1.9 Knots.

Example corrupted message

$GPRMC,100851.00,A,5048.73249,N,00005.86148,W,0.01**$GPGGA**,100851.00,5048.73249,N,00005.8614$GPGLL,5048.73249,N,00005.86148,W,100851.0

Looks like the message failed to get a new line. I notice a number of other types of incomplete or corrupted messages so checking them will be an essential part of the build.

Creating a message reader

I don't really want to sit on the boat building code. I need to be doing this while traveling and at home when I get time. So, spooling half an hour of data to a log file gets me started. I can use Python to read from the file and once up and running spool the log file to a local TCP/IP port and read using Python socket library.

Firstly, I read the log file and loop through the messages, each message I check to see if it's valid using the checksum, line length. I used this to log the number of messages in error etc. I have posted the test function, I'm sure there are better ways to write the code but it works.

#DEF Function to test message
 def is_message_valid (orig_line):

  #check if hash is valid
  #set variables
  x = 1
  check = 0
  received_checksum = 0
  line_length = len(orig_line.strip())

  while (x <= line_length):="" current_char="orig_line[x]" #checksum="" is="" always="" two="" chars="" after="" the="" *="" if="" "*":="" received_checksum="orig_line[x+1]" +="" orig_line[x+2]="" #check="" where="" we="" are="" there="" more="" to="" decode="" then="" #have="" take="" into="" account="" new="" line="" line_length=""> (x+3):
        check = 0

      #no need to continue to the end of the 
      #line either error or have checksum
      break

    check = check^ord(current_char)
    x = x + 1; 

  if format(check,"2X") == received_checksum:
    #substring the new line for printing
    #print "Processed nmea line >> " + orig_line[:-1] + " Valid message" 
    _Valid = 1
  else:
    #substring the new line for printing
    _Valid = 0

  return _Valid

Now for the translation of messages. There are a number of example Python packages in GitHub that translate NMEA messages but I am only currently interested in specific messages, I also want to build appropriate JSON so feel I am better writing this from scratch. Python has JSON libraries so fairly straight forward once the message is defined. I start by looking at the wind and depth messages. I'm not currently seeing any speed messages hopefully because the boat wasn't moving.

def convert_iimwv_json (orig_line):
 #iimwv wind instrumentation

 column_list = orig_line.split(",")

 #star separates the checksum from status
 status_check_sum = column_list[5].split("*")
 checksum_value = status_check_sum[1]

 json_str = 
 {'message_type' : column_list[0], 
 'wind_angle' : column_list[1], 
 'relative' : column_list[2], 
 'wind_speed' : column_list[3], 
 'wind_speed_units' : column_list[4], 
 'status' : status_check_sum[0], 
 'checksum' : checksum_value[0:2]}

 json_dmp = json.dumps(json_str)
 json_obj = json.loads(json_dmp)

 return json_str

I now have a way of checking, reading and converting the message to JSON from a log file. Switching from reading a file to to using the Python socket library I can read the stream directly from a TCP/IP port. Using nc it's possible to simulate the message being sent from the instruments by piping the log file to a port.

Opening port 1234 and listening for terminal input

nc -l 1234

Having spoken to some experts from Digital Yachts it maybe that the missing messages are because Raymarine SeakTalk is not transmitting an NMEA message for speed and a number of other readings. The way I have wired up the NMEA inputs and outputs to the AIS hub may also be causing the doubling up of messages and apparent corruptions. I need more kit! A bi-direction SeaTalk to NMEA converter.

In the next article, I discuss the use of Kafka in the architecture. I want to buffer all my incoming raw messages. If I store all the incoming I can build out the analytics over time i.e as I decode each message type. I will also set about creating a near real time dashboard to display the incoming metrics. The use of Kafka will give me scalability in the model. I'm particularly thinking of Round the Island Race 1,800 boats a good number of these will be transmitting AIS data.


Categories: BI & Warehousing

Partner Webcast – Simplifying Mobility with Oracle Mobile and Chatbots Cloud

Mobile is everywhere and continues to be the dominant way we consume information and services, but mobile apps are facing pressure to change the way they are built and consumed. As consumers,we...

We share our skills to maximize your revenue!
Categories: DBA Blogs

SQL Server on Linux and logging

Yann Neuhaus - 22 hours 55 sec ago

On Windows world, SQL Server logs information both into the SQL Server error log and the Application log. Both automatically timestamp all recorded events. Unlike the SQL Server error log, the Windows application log provides an overall picture of events that occur globally on the Windows operating system. Thus, regarding the encountered issues taking a look at such event logs – by using either the Windows event viewer or the Get-EventLog PowerShell cmdlet – may be very helpful to figure out they are only SQL Server-scoped or if you have to correlate with to other operating system issues.

But what about SQL Server on Linux? Obviously, we may use the same logging technologies. As Windows, SQL Server logs information both in the SQL Server error log located on /var/opt/mssql/log/ and in Linux logs. Because SQL Server is only supported on Linux distributions that all include systemd ( RHEL 7.3+, SLES V12 SP2+ or Ubuntu 16.04+) we have to go through the journalctl command to browse the messages related to the SQL Server instance.

systemd-journald is a system service that collects and stores logging data based on logging information that is received from a variety of sources – Kernel and user log messages. All can be viewed through the journalctl command.

Let’s say that the journalctl command is very powerful and I don’t aim to cover all the possibilities. My intention is only to dig into some examples in the context of SQL Server. Conceptually this is not so different than we may usually do on Windows system for basic stuff.

Firstly, let’s say we may use a couple of options to filter records we want to display. Probably the first intuitive way to go through the journalctl command is to use time interval parameters as –since and –until as follows:

[root@sqllinux ~] journalctl --since "2018-01-16 12:00:00" --until "2018-01-16 23:30:00"

Here a sample of the corresponding output:

blog 126 - 1 - journalctl with time interval filter

All log messages are displayed including the kernel. But rather than using time interval filters we may prefer to use the -b parameter to show all log messages since the last system boot for instance:

[root@sqllinux ~] journalctl -b

The corresponding output:

blog 126 - 2 - journalctl with last boot filter

You may use different commands to get the system reboot as uptime, who -b. I’m in favour of last reboot because it provides the last reboot date rather than the uptime of the system.

Furthermore, one interesting point is that if you want to get log messages from older system boots (and not only the last one) you have to setup accordingly system-journald to enable log persistence. By default, it is volatile and logs are cleared after each system reboot. You may get this information directly from the system-journald configuration file (#Storage=auto by default):

[root@sqllinux ~] cat /etc/systemd/journald.conf
…
[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
…

I remembered a customer case where I had to diagnose a database check integrity job scheduled on each Sunday and that failed randomly. We finally figure out that the root cause was a system reboot after an automatic update. But the tricky part was that not all system reboots did not lead to fail the DBCC CHECKDB command and according to the information from the Windows log we understood it depended mostly on the DBCC CHECKDB command execution time which sometimes exceeded the time scheduled for system reboot. So, in this case going back to the previous reboots (before the last one) was helpful for us. Let’s say that for some Linux distributions this is not the default option and my colleague Daniel Westermann in the dbi services open source team explained it well through his blog post and how to change the default behavior as well.

So, after applying the correct setup, if you want to display log messages after a pre-defined boot time you may first identify the different system boot times logged into the journal as follows:

[root@sqllinux ~] journalctl --list-boots
-1 576f0fb259f4433083c05329614d749e Tue 2018-01-16 15:41:15 CET—Wed 2018-01-17 20:30:41 CET
 0 ea3ec7019f8446959cfad0bba517a47e Wed 2018-01-17 20:33:30 CET—Wed 2018-01-17 20:37:05 CET

Then you may rewind the journal until the corresponding offset:

[root@sqllinux ~] journalctl -b -1 
-- Logs begin at Tue 2018-01-16 15:41:15 CET, end at Wed 2018-01-17 20:37:40 CET. --
Jan 16 15:41:15 localhost.localdomain systemd-journal[105]: Runtime journal is using 8.0M (max allowed 188.7M, trying to leave 283.1M free
 of 1.8G available → current limit 188.7M).
….

Let’s go ahead with filtering by unit (mssql-server unit). This is likely the most useful way for DBAs to display only SQL Server related records with a combination of the aforementioned options (time interval or last boot(s) parameters). In the following example, I want to display SQL Server related records since a system boot that occurred on 18 January 2018 20:39 (I may also deal with interval time filters)

[root@sqllinux ~] journalctl -b -1 -u mssql-server.service 
-- Logs begin at Tue 2018-01-16 15:41:15 CET, end at Wed 2018-01-17 20:39:55 CET. --
Jan 16 15:41:17 sqllinux.dbi-services.test systemd[1]: [/usr/lib/systemd/system/mssql-server.service:21] Unknown lvalue 'TasksMax' in sect
ion 'Service'
Jan 16 20:47:15 sqllinux.dbi-services.test systemd[1]: Started Microsoft SQL Server Database Engine.
Jan 16 20:47:15 sqllinux.dbi-services.test systemd[1]: Starting Microsoft SQL Server Database Engine
...
Jan 16 20:47:22 sqllinux.dbi-services.test sqlservr[1119]: 2018-01-16 20:47:22.35 Server      Microsoft SQL Server 2017 (RTM-CU2) (KB40525
74) - 14.0.3008.27 (X64)
…

You may also want to get only error concerned your SQL Server instance. If you already used syslog in the past you will still be comfortable with systemd-journal that implements the standard syslog message levels and message priorities. Indeed, each message has its own priority as shown below. The counterpart on Windows event log are event types (warning, error, critical etc …). On Linux priorities are identified by number – 6 corresponds to info messages and 3 to error messages. Here an log message’s anatomy with the priority value.

[root@sqllinux ~] journalctl -b -1 -u mssql-server.service -n 1 -o verbose
-- Logs begin at Tue 2018-01-16 15:41:15 CET, end at Wed 2018-01-17 20:48:36 CET. --
Wed 2018-01-17 20:30:38.937388 CET [s=5903eef6a5fd45e584ce03a4ae329ac3;i=88d;b=576f0fb259f4433083c05329614d749e;m=13e34d5fcf;t=562fde1d9a1
    PRIORITY=6
    _UID=0
    _GID=0
    _BOOT_ID=576f0fb259f4433083c05329614d749e
    _MACHINE_ID=70f4e4633f754037916dfb35844b4b16
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    CODE_FILE=src/core/job.c
    CODE_FUNCTION=job_log_status_message
    RESULT=done
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    CODE_LINE=784
    MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
    _HOSTNAME=sqllinux.dbi-services.test
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    UNIT=mssql-server.service
    MESSAGE=Stopped Microsoft SQL Server Database Engine.
    _SOURCE_REALTIME_TIMESTAMP=1516217438937388

 

So, if you want to restrict more the output with only warning, error or critical messages (from a daemon point of view), you may have to add the -p option with a range of priorities from 2 (critical) and 4 (warning) as shown below:

[root@sqllinux ~] journalctl -p 2..4 -u mssql-server.service
-- Logs begin at Tue 2018-01-16 15:41:15 CET, end at Wed 2018-01-17 21:44:04 CET. --
Jan 16 15:41:17 sqllinux.dbi-services.test systemd[1]: [/usr/lib/systemd/system/mssql-server.service:21] Unknown lvalue 'TasksMax' in sect
-- Reboot --
Jan 17 15:27:42 sqllinux.dbi-services.test systemd[1]: [/usr/lib/systemd/system/mssql-server.service:21] Unknown lvalue 'TasksMax' in sect
lines 1-4/4 (END)

Ultimately, filtering by message will be probably the most natural way to find out log messages. Let’s say in this case there is no built-in parameters or options provided by journalctl command and grep will be your friend for sure. In the following example, a classic customer case where we want to count number of failed logins during a specific period. So, I will have to use a combination of journalctl, grep and wc commands:

[root@sqllinux ~] journalctl -u mssql-server.service --since "2018-01-17 12:00:00" --until "2018-01-17 23:00:00"  | grep "Login failed" | wc -l
31

Finally, the journalctl command offers real-time capabilities to follow log messages through the -f option. For very specific cases it might be useful. In the example below I can use it to follow SQL Server related log messages:

[root@sqllinux ~] journalctl -u mssql-server.service -f
-- Logs begin at Tue 2018-01-16 15:41:15 CET. --
Jan 17 21:52:57 sqllinux.dbi-services.test sqlservr[1121]: 2018-01-17 21:52:57.97 Logon       Error: 18456, Severity: 14, State: 8.
Jan 17 21:52:57 sqllinux.dbi-services.test sqlservr[1121]: 2018-01-17 21:52:57.97 Logon       Login failed for user 'sa'. Reason: Password did not match that for the login provided. [CLIENT: 192.168.40.30]

 

Another topic I wanted to introduce is the centralized logging management. Nowadays, a plenty of third party tools like splunk – or built-in Microsoft tools as SCOM – may address this need both on Windows and Linux world. I also remembered a special customer case where we went through built-in Windows event forwarding mechanism. On Linux world, you may benefit from a plenty of open source tools and you may also rely on built-in Linux tools as systemd-journal-remote, systemd-journal-upload and systemd-journal-gateway as well. I will probably go further into these tools in the future but this time let’s use an older tool rsyslog that implements the basic syslog protocol and extends it with additional features. In this blog post I used a CentOS 7 distro that comes with rsyslog. The good news is that it also includes by default the imjournal module (that provides access to the systemd journal). This module reads log from /run/log/journal and then writes out /var/log/messages, /var/log/maillog, /var/log/secure or others regarding the record type. Log records may be send over TCP or UDP protocols and securing capabilities are also provided (by using TLS and certificates for instance).

Just out of curiosity, I decided to implement a very simple log message forwarding scenario to centralize only SQL Server log messages. Basically, I only had to setup some parameters in the /etc/rsyslog.conf on both sides (sender and receiver servers) as well as applying some firewall rules to allow the traffic on port 514. In addition, I used TCP protocol because this is probably the simplest way to send log messages (because corresponding module are already loaded). Here an illustration of my scenario:

blog 126 - 3 - rsyslog architecture

Here the configuration settings of my log message sender. You may notice that I used expression-Based filters to filter and to send only my SQL Server instance related messages :

[root@sqllinux ~] cat /etc/rsyslog.conf
#### MODULES ####

# The imjournal module bellow is now used as a message source instead of imuxsock.
$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imjournal # provides access to the systemd journal
#$ModLoad imklog # reads kernel messages (the same are read from journald)
#$ModLoad immark  # provides --MARK-- message capability
…
# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
#*.* @@remote-host:514
if $programname == 'sqlservr' then @@192.168.40.21:514 
…

On the receiver side I configured rsyslog daemon to accept messages that come from TCP protocol and port 514. Here a sample (only the interesting part) of the configuration file:

[root@sqllinux2 ~] cat /etc/rsyslog.conf
…
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 514

 

Finally, I ran a simple test to check if the log message forwarding process works correctly by using the following T-SQL command from my SQLLINUX instance …

RAISERROR('test syslog from SQLLINUX instance', 10, 1) WITH LOG

 

… and after jumping to the receiver side (SQLLINUX2) I used the tail command to check if my message was sent correctly:

[root@sqllinux2 ~] tail -f /var/log/messages
…
Jan 18 21:50:40 sqllinux sqlservr: 2018-01-18 21:50:40.66 spid57      test syslog
Jan 18 21:51:03 sqllinux sqlservr: 2018-01-18 21:51:03.75 spid57      test syslog 1 2 3
Jan 18 21:52:08 sqllinux sqlservr: 2018-01-18 21:52:08.74 spid52      Using 'dbghelp.dll' version '4.0.5'
Jan 18 21:56:31 sqllinux sqlservr: 2018-01-18 21:56:31.13 spid57      test syslog from SQLLINUX instance

Well done!
In this blog post we’ve surfaced how SQL Server deals with Linux logging system and how we may use the journalctl command to find out information for troubleshooting. Moving from Windows to Linux in this field remains straightforward with finally the same basics. Obviously, Linux is a command-line oriented operating system so you will not escape to use them :-)

 

Cet article SQL Server on Linux and logging est apparu en premier sur Blog dbi services.

Scheduler Jobs stopped running after delete job_creator

Tom Kyte - 22 hours 14 min ago
I have 2 jobs that stopped running. They were marked as BROKEN. On *_SCHEDULER_JOB_LOG view it shows REASON="Job creator:[bda_user_xyz...] dropped". Both jobs are setted as MAX_FAILURES = null Does any one know if this is a bug? I can't find ...
Categories: DBA Blogs

Table Statistics Get Null or Empty

Tom Kyte - 22 hours 14 min ago
What could be the possible reason(s) why the table statistics (DBA_TAB_STATISTICS) got null or empty i.e. num_rows, last_analyzed columns? At certain date <b>DBA_TAB_STATISTICS</b> num_rows, last_analyzed columns have values (not empty/null), and...
Categories: DBA Blogs

Google friendly application URLs in APEX

Tom Kyte - 22 hours 14 min ago
Hi Now a days, there had been lots of discussion about having google friendly url for a web application. I have gone thru many links, but still confused. My question is - if I create a public application with Oracle Apex, could google able to ...
Categories: DBA Blogs

Analytical Function to compute running daily overtime by week

Tom Kyte - 22 hours 14 min ago
Hi there, I may be overthinking this query and would really appreciate some input/kick in the pants. How in the heck do I get daily overtime based on a 40 hour work week? If the week isn't complete but an employee has racked up more than 40 hours...
Categories: DBA Blogs

Reading image file

Tom Kyte - 22 hours 14 min ago
Hello MR.TOM please i have answer <b>Reading image file to custom table and export the image into a new file. Let's say we have a file called 'horse.jpg' and upload it to a table. After that you wat to export it to a new file horse1.jpg...
Categories: DBA Blogs

Should I use SQL or Python?

Bobby Durrett's DBA Blog - Thu, 2018-01-18 13:57

We had an outage on an important application last Thursday. A particular SQL statement locked up our database with library cache: mutex X waits. I worked with Oracle support to find a bug that caused the issue and we came up with a good workaround. The bug caused a bunch of shared cursor entries. So, I wanted to run a test on a test database to recreate the excess shared cursor entries. I wanted to run the SQL query that caused the outage a bunch of times. Also, we embed the SQL query inside a PL/SQL procedure so I wanted to run the query by calling the procedure. So, I needed to come up with a bunch of calls to the procedure using realistic data as a test script. This blog post is about the decision I had to make about creating the test script. Would I use SQL or Python to quickly hack together my test script? I thought it would be interesting to write about my choice because I am working on my Python for the Oracle DBA talk that encourages Oracle DBAs to learn Python. In this situation I turned to SQL instead of Python so what does that say about the value of Python for Oracle DBAs?

Let me lay out the problem that I needed to solve. Note that I was trying to get this done quickly and not spend a lot of time coming up with the perfect way to do it. I had over 6000 sets of bind variable values that the problem query has used in the past. I used my bind2.sql script to get some sample bind variable values for the problem query. The output of bind2.sql was in this format:

2017-11-27 15:08:56 :B1 1
2017-11-27 15:08:56 :B2 ABC
2017-11-27 15:08:56 :B3 JAFSDFSF
2017-11-27 15:08:56 :B4 345
2017-11-27 15:08:56 :B5 6345
2017-11-27 15:08:56 :B6 10456775
2017-11-27 15:08:56 :B7 34563465
2017-11-27 15:08:56 :B8 433
2017-11-27 15:09:58 :B1 1
2017-11-27 15:09:58 :B2 JUL
2017-11-27 15:09:58 :B3 KSFJSDJF
2017-11-27 15:09:58 :B4 234
2017-11-27 15:09:58 :B5 234253
2017-11-27 15:09:58 :B6 245
2017-11-27 15:09:58 :B7 66546
2017-11-27 15:09:58 :B8 657576
2017-11-27 15:10:12 :B1 1
2017-11-27 15:10:12 :B2 NULL
2017-11-27 15:10:12 :B3 NULL
2017-11-27 15:10:12 :B4 45646
2017-11-27 15:10:12 :B5 43
2017-11-27 15:10:12 :B6 3477
2017-11-27 15:10:12 :B7 6446
2017-11-27 15:10:12 :B8 474747

I needed to convert it to look like this:

exec myproc(34563465,10456775,345,433,6345,'JAFSDFSF','ABC',1,rc);
exec myproc(66546,245,234,657576,234253,'KSFJSDJF','JUL',1,rc);
exec myproc(6446,3477,45646,474747,43,'NULL','NULL',1,rc);

I gave myself maybe a minute or two to decide between using SQL or Python. I choose SQL. All I did was insert the data into a table and then manipulate it using SQL statements. Note that the order of the arguments in the procedure call is not the same as the order of the bind variable numbers. Also, some are character and some are number types.

Here is the SQL that I used:

drop table bindvars;

create table bindvars
(datetime varchar2(20),
 varname varchar2(2),
 varvalue varchar2(40));

insert into bindvars values ('2017-11-27 15:08:56','B1','1');
insert into bindvars values ('2017-11-27 15:08:56','B2','ABC');
insert into bindvars values ('2017-11-27 15:08:56','B3','JAFSDFSF');
insert into bindvars values ('2017-11-27 15:08:56','B4','345');
insert into bindvars values ('2017-11-27 15:08:56','B5','6345');
insert into bindvars values ('2017-11-27 15:08:56','B6','10456775');
insert into bindvars values ('2017-11-27 15:08:56','B7','34563465');
insert into bindvars values ('2017-11-27 15:08:56','B8','433');
insert into bindvars values ('2017-11-27 15:09:58','B1','1');
insert into bindvars values ('2017-11-27 15:09:58','B2','JUL');
insert into bindvars values ('2017-11-27 15:09:58','B3','KSFJSDJF');
insert into bindvars values ('2017-11-27 15:09:58','B4','234');
insert into bindvars values ('2017-11-27 15:09:58','B5','234253');
insert into bindvars values ('2017-11-27 15:09:58','B6','245');
insert into bindvars values ('2017-11-27 15:09:58','B7','66546');
insert into bindvars values ('2017-11-27 15:09:58','B8','657576');
insert into bindvars values ('2017-11-27 15:10:12','B1','1');
insert into bindvars values ('2017-11-27 15:10:12','B2','NULL');
insert into bindvars values ('2017-11-27 15:10:12','B3','NULL');
insert into bindvars values ('2017-11-27 15:10:12','B4','45646');
insert into bindvars values ('2017-11-27 15:10:12','B5','43');
insert into bindvars values ('2017-11-27 15:10:12','B6','3477');
insert into bindvars values ('2017-11-27 15:10:12','B7','6446');
insert into bindvars values ('2017-11-27 15:10:12','B8','474747');

commit;

drop table bindvars2;

create table bindvars2 as
select 
b1.varvalue b1,
b2.varvalue b2,
b3.varvalue b3,
b4.varvalue b4,
b5.varvalue b5,
b6.varvalue b6,
b7.varvalue b7,
b8.varvalue b8
from 
bindvars b1,
bindvars b2,
bindvars b3,
bindvars b4,
bindvars b5,
bindvars b6,
bindvars b7,
bindvars b8
where
b1.datetime = b2.datetime and
b1.datetime = b3.datetime and
b1.datetime = b4.datetime and
b1.datetime = b5.datetime and
b1.datetime = b6.datetime and
b1.datetime = b7.datetime and
b1.datetime = b8.datetime and
b1.varname = 'B1' and
b2.varname = 'B2' and
b3.varname = 'B3' and
b4.varname = 'B4' and
b5.varname = 'B5' and
b6.varname = 'B6' and
b7.varname = 'B7' and
b8.varname = 'B8';

select 'exec myproc('||
B7||','||
B6||','||
B4||','||
B8||','||
B5||','''||
B3||''','''||
B2||''','||
B1||',rc);'
from bindvars2;

I hacked the insert statements together with my Textpad text editor and then loaded the rows into a little table. Then I built a new table which combines the 8 rows for each call into a single row with a column for each bind variable. Finally I queried the second table generating the procedure calls with single quotes, commas and other characters all in the right place.

Now that the rush is past and my testing is done I thought I would hack together a quick Python script to do the same thing. If I had chosen Python how would have I done it without spending a lot of time making it optimal? Here is what I came up with:

Instead of insert statements I pulled the data into a multi-line string constant. Then I split it into a list of strings with each string representing a line. Then I split each line into space delimited strings so each line would have date,time,bind variable name, bind variable value. Finally I looped through each set of 8 lines extracting the bind variable values and then printing the bind variables in the correct order and format.

These are two quickly hacked together solutions. I think the key point is how I stored the data. With SQL I used tables. With Python I used lists. I’m not sure which I like better in this case. I’ve been doing SQL longer but Python wasn’t really harder. I guess my decision under pressure to use SQL shows that I still have more comfort with the SQL way of doing things, but my after the fact Python hacking shows that the Python solution was not any harder. FWIW.

Bobby

 

Categories: DBA Blogs

Alfresco DevCon 2018 – Day 2 – Big files, Solr Sharding and Minecraft, again!

Yann Neuhaus - Thu, 2018-01-18 13:46

Today is the second day of the Alfresco DevCon 2018 and therefore yes, it is already over, unfortunately. In this blog, I will be continuing my previous one with sessions I attended on the afternoon of the day-1 as well as day-2. There were too many interesting sessions and I don’t really have the time to talk about all of them… But if you are interested, all the sessions were recorded (as always) so wait a little bit and check out the DevCon website, the Alfresco Community or the Alfresco Youtube channel and I’m sure you will find all the recordings as soon as they are available.

 

So on the afternoon of the day-1, I started with a presentation of Jeff Potts, you all know him, and he was talking about how to move in (upload) and out (download) of Alfresco some gigantic files (several gigabytes). He basically presented a use case where the users had to manage big files and put them all in Alfresco with the less headache possible. On the paper, Alfresco can handle any file no matter the size because the only limit is what the File System of the Alfresco Server supports. However, when you start working with 10 or 20 GB files, you can sometimes have issues like exceptions, timeouts, network outage, aso… It might not be frequent but it can happen for a variety of reasons (not always linked to Alfresco). The use case here was to simplify the import into Alfresco and make it faster. Jeff tested a lot of possible solutions like using the Desktop Sync, CMIS, FTP, the resumable upload share add-on, aso…

In the end, a pure simple (1 stream) upload/download will always be limited by the network. So he tried to work on improving this part and used the Resilio Sync software (formerly BitTorrent Sync). This tool can be used to stream a file to the Alfresco Server, BitTorrent style (P2P). But the main problem of this solution is that P2P is only as good as the number of users having this specific file available on their workstation… Depending on the use case, it might increase the performance but it wasn’t ideal.

In the end, Jeff came across the protocol “GridFTP”. This is an extension of the FTP for grid computing whose purpose is to make the file transfer more reliable and faster using multiple simultaneous TCP streams. There are several implementations of the GridFTP like the Globus Toolkit. Basically, the solution in this case was to use Globus to transfer the big files from the user’s workstation to a dedicated File System which is mounted on the Alfresco Server. Then using the Alfresco Bulk FileSystem Import Tool (BFSIT), it is really fast to import documents into Alfresco, as soon as they are on the File System of the Alfresco Server. For the download, it is just the opposite (using the BFSET)…

For files smaller than 512Mb, this solution is probably slower than the default Alfresco upload/download actions but for bigger files (or group of files), then it becomes very interesting. Jeff did some tests and basically for one or several files with a total size of 3 or 4GB, then the transfer using Globus and then the import into Alfresco was 50 to 60% faster than the Alfresco default upload/download.

 

Later, Jose Portillo shared Solr Sharding Best Practices. Sharding is the action of splitting your indexes into Shards (part of an index) to increase the searches and indexing (horizontal scaling). The Shards can be stored on a single Solr Server or they can be dispatched on several. Doing this basically increase the search speed because the search is executed on all Shards. For the indexing of a single node, there is no big difference but for a full reindex, it does increase a lot the performance because you do index several nodes at the same time on each Shards…

A single Shard can work well (according to the Alfresco Benchmark) with up to 50M documents. Therefore, using Shards is mainly for big repositories but it doesn’t mean that there are no use cases where it would be interesting for smaller repositories, there are! If you want to increase your search/index performance, then start creating Shards much earlier.

For the Solr Sharding, there are two registration options:

  • Manual Sharding => You need to manually configure the IPs/Host where the Shards are located in the Alfresco properties files
  • Dynamic Sharding => Easier to setup and Alfresco automatically provide information regarding the Shards on the Admin interface for easy management

There are several methods of Shardings which are summarized here:

  • MOD_ACL_ID (ACL v1) => Sharding based on ACL. If all documents have the same ACL (same site for example), then they will all be on the same Shard, which might not be very useful…
  • ACL_ID (ACL v2) => Same as v1 except that it uses the murmur hash of the ACL ID and not its modulus
  • DB_ID (DB ID) => Default in Solr6. Nodes are evenly distributed on the Shards based on their DB ID
  • DB_ID_RANGE (DB ID Range) => You can define the DB ID range for which nodes will go to which Shard (E.g.: 1 to 10M => Shard-0 / 10M to 20M => Shard-1 / aso…)
  • DATE (Date and Time) => Assign date for each Shards based on the month. It is possible to group some months together and assign a group per Shard
  • PROPERTY (Metadata) => The value of some property is hashed and this hash is used for the assignment to a Shard so all nodes with the same value are in the same Shard
  • EXPLICIT (?) => This is an all-new method that isn’t yet on the documentation… Since there aren’t any information about this except on the source code, I asked Jose to provide me some information about what this is doing. He’ll look at the source code and I will update this blog post as soon as I receive some information!

Unfortunately, the Solr Sharding has only been available starting with Alfresco Content Services 5.1 (Solr 4) and only using the ACL v1 method. New methods were then added using the Alfresco Search Services (Solr 6). The availability of methods VS Alfresco/Solr versions has been summarized in Jose’s presentation:

DevCon2018_ShardingMethodsAvailability

Jose also shared a comparison matrix of the different methods to choose the right one for each use case:

DevCon2018_ShardingMethodsFeatures

Some other best practices regarding the Solr Sharding:

  • Replicate the Shards to increased response time and it also provides High Availability so… No reasons not to!
  • Backup the Shards using the provided Web Service so Alfresco can do it for you for one or several Shards
  • Use DB_ID_RANGE if you want to be able to add Shards without having to perform a full reindex, this is the only way
  • If you need another method than DB_ID_RANGE, then plan carefully the number of Shards to be created. You might want to overshard to take into account the future growth
  • Keep in mind that each Shard will pull the changes from Alfresco every 15s and it all goes to the DB… It might create some load there and therefore be sure that your DB can handle that
  • As far as I know, at the moment, the Sharding does not support Solr in SSL. Solr should anyway be protected from external accesses because it is only used by Alfresco internally so this is an ugly point so far but it’s not too bad. Sharding is pretty new so it will probably support the SSL at some point in the future
  • Tune properly Solr and don’t forget the Application Server request header size
    • Solr4 => Tomcat => maxHttpHeaderSize=…
    • Solr6 => Jetty => solr.jetty.request.header.size=…

 

The day-2 started with a session from John Newton which presented the impact of emerging technologies on content. As usual, John’s presentation had a funny theme incorporated in the slides and this time it was Star Wars.

DevCon2018_StarWars

 

After that, I attended the Hack-a-thon showcase, presented/introduced by Axel Faust. In the Alfresco world, Hack-a-thons are:

  • There since 2012
  • Open-minded and all about collaboration. Therefore, the output of any project is open source and available for the community. It’s not about money!
  • Always the source of great add-ons and ideas
  • 2 times per year
    • During conferences (day-0)
    • Virtual Hack-a-thon (36h ‘follow-the-sun’ principle)

A few of the 16 teams that participated in the Hack-a-thon presented the result of their Hack-a-thon day and there were really interesting results for ACS, ACS on AWS, APS, aso…

Except that, I also attended all lightning talks on this day-2 as well as presentations on PostgreSQL and Solr HA/Backup solutions and best practices. The presentations about PostgreSQL and Solr were interesting especially for newcomers because it really explained what should be done to have a highly available and resilient Alfresco environment.

 

There were too many lightning talk to mention them all but as always, there were some quite interesting and there I just need to mention the talk about the ContentCraft plugin (from Roy Wetherall). There cannot be an Alfresco event (be it a Virtual Hack-a-thon, BeeCon or DevCon now) without an Alfresco integration into Minecraft. Every year, Roy keeps adding new stuff into his plugin… I remember years ago, Roy was already able to create a building in Minecraft where the height represented the number of folders stored in Alfresco and the depth was the number of documents inside, if my memory is correct (this changed now, it represents the number of sub-folders). This year, Roy presented the new version and it’s even more incredible! Now if you are in front of one of the building’s door, you can see the name and creator of the folder in a ‘Minecraft sign’. Then you can walk in the building and there is a corridor. On both sides, there are rooms which represent the sub-folders. Again, there are ‘Minecraft signs’ there with the name and creator of the sub-folders. Until then, it’s just the same thing again so that’s cool but it will get even better!

If you walk in a room, you will see ‘Minecraft bookshelves’ and ‘Minecraft chests’. Bookshelves are just there for the decoration but if you open the chests, then you will see, represented by ‘Minecraft books’, all your Alfresco documents stored on this sub-folder! Then if you open a book, you will see the content of this Alfresco document! And even crazier, if you update the content of the book on Minecraft and save it, the document stored in Alfresco will reflect this change! This is way too funny :D.

It’s all done using CMIS so there is nothing magical… Yet it really makes you wonder if there are any limits to what Alfresco can do ;).

 

If I dare to say: long live Alfresco! And see you around again for the next DevCon.

 

 

Cet article Alfresco DevCon 2018 – Day 2 – Big files, Solr Sharding and Minecraft, again! est apparu en premier sur Blog dbi services.

Critical Patch Update for January 2018 Now Available

Steven Chan - Thu, 2018-01-18 11:37

The Critical Patch Update (CPU) for January 2018 was released on January 16, 2018. Oracle strongly recommends applying the patches as soon as possible.

The Critical Patch Update Advisory is the starting point for relevant information. It includes a list of products affected, pointers to obtain the patches, a summary of the security vulnerabilities, and links to other important documents. 

Supported products that are not listed in the "Supported Products and Components Affected" Section of the advisory do not require new patches to be applied.

The Critical Patch Update Advisory is available at the following location:

It is essential to review the Critical Patch Update supporting documentation referenced in the Advisory before applying patches.

The next four Critical Patch Update release dates are:

  • April 17, 2018
  • July 17, 2018
  • October 16, 2018
  • January 15, 2019
References Related Articles
Categories: APPS Blogs

Column Stats

Jonathan Lewis - Thu, 2018-01-18 08:22

I’ve made several comments in the past about the need for being selective when gathering objects statistics with particular reference to the trade-offs when creating histograms. With Oracle 12c it’s now reasonably safe (as far as I’m concerned) to set a method_opt as a table preference that identifies columns where you expect to see Frequency or (pace the buggy behaviour described in a recent post) a Top-N histograms. The biggest problem I have is that I keep forgetting the exact syntax I need – so I’ve written this note more as a reminder to myself than anything else.

Typically I might expect to use the standard 254 columns for gathering histograms, with an occasional variation to increase the bucket count; but for the purposes of this note I’m going to demonstarate with a much lower value. So here’s a table creation statement (running 12.1.0.2 – so it will gather basic stats on the create) and two variations of a call to gather stats with a specific method_opt – followed by a question:

create table t1
as
select
        object_type o1,
        object_type o2,
        object_type o3,
        object_id,
        object_name
from
        all_objects
where
        rownum <= 50000 -- > comment to bypass wordpress format problem
;

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1';


The big question is this: which columns will have histograms after each of the gather_table_stats() calls:

method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'

The problem I have is simple – to me both options look as if they will create histograms on all three named columns but the first option is the one that I type in “intuitively” if I don’t stop to think about it carefully. The first option, alas, will only gather a histogram on column o3 – the second option is the one that creates three histograms.

The manuals are a little unclear and ambiguous about how to construct a slightly complicated method_opt; there’s a fragment of text with the usual mix of square brackets, italics and ellipses to indicate optional and repeated clauses (interestingly the only clue about multiple columns is that comma separation seems to be required – despite one of the examples above working withough commas) but there’s no explanation of when a “size” clause should go before a “column” column and when it should go after.

So here are a few more method_opt clauses – can you work out in advance which columns would have histograms if you used them and how many buckets in each histogram; there are a couple that may surprise you:


for columns o1 size 12, o2 size 13, o3 size 14

for columns o1 size 15 o2 size 16 o3 size 17

for columns size 18 o1 size 19 o2 size 20 o3

for columns size 21 o1 o2 size 22 o3

for columns o1 size 12, o2 size 12, o3 size 13, object_id size 13 object_name size 14

for columns size 22 o1 o2 for columns size 23 o3 object_id for columns size 24  object_name

Bottom line – to me – is to check very carefully that the method_opt is going to do what I want it to do; and for production systems I tend to use the final form that repeats the “for columns {size clause} {column list}”.

Error importing a DMP file to a very different database

Tom Kyte - Thu, 2018-01-18 07:06
I created a completely new oracle database and I am trying to import a DMP file from a full backup of another database and I am getting several errors. Can you help me: <b>Command:</b> impdp system/welcome1 full=yes directory=BACKUPSDR dumpfile=bc...
Categories: DBA Blogs

Making a URL call in PL/SQL

Tom Kyte - Thu, 2018-01-18 07:06
Hi Tom, I want to open Apex Application in New Window or New Tab through Oracle EBS Home page. I have tried with different method and finally came up with solution based on below link https://asktom.oracle.com/pls/apex/asktom.search?tag=making...
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator