Re: HOw to introduce a proactive approach on DB use?

From: Tim X <timx_at_nospam.dev.null>
Date: Sat, 11 Oct 2008 18:24:50 +1100
Message-ID: <87zllb8ugt.fsf@lion.rapttech.com.au>


Helma <helma.vinke_at_hotmail.com> writes:

> hello everyone,
>
> I just arrived at a shop where there are monthly performance problems.
> At every end of the month, endusers are running reports (business
> objects) and Peoplesoft batches.
> Problem is, that there is no overview with the DBA-team of the
> database use, e.g. :
>
> - a nightbatch was accidentally started during daytime, the DBA team
> was unable to figure this out. (enduser found out about this one)
>

That seems a bit surprising. We have had similar problems at times and our DBAs have always been able to determine what was consuming all the resources.

> - an other batch exeeded it's running time by more than 200% - DBA
> team found out when looking for something else.
>
> - the same report is ran by different users
>
> - there are no ad hoc queries , but non of the batches and reports are
> examined for performance.
>
> etc.
>
> I wonder how to get control over this situation and introduce a
> proactive approach.
>
> E.g. , I need an overview of the database use, and as a second target,
> be able to quickly identify reports/ batches that are running amok.
>
> If anyone has pointers on how they've dealt with such a situation, i
> would be glad to hear.
>
>
> Solaris, Oracle 9.2.0.5, app server
>
> Thank you for your time,

  1. If you have access to the code and if its using PL/SQL or Java, consider adding dbms_application_info statements. this can help the DBAs identify what is going on.
  2. See if you can schedule the regular reports rather than allowing end users to execute them whenever they want. for example, if you know that some end users need a particular report on the last Thursday of each month and if it doesn't need to be run immediately, schedule the report to run during a low demand period, such as 3am (watch out for backups etc).

What we ended up doing was implementing another layer between the app server and the database. this layer established a hierarchical form of access control. End users were still able to request reports, but unless they had the necessary access rights, their report would be scheduled to run during a low demand period. some senior staff had the authority to execute a report immediately. In most cases, staff had no problems with having to wait for a report (it did require some change management and some users complained because they werre now required to 'think ahead'. However, the outcome did mean that they actually got their reports and usually got them at an expected time. Previously, there was a lot of variation - once they got use to having to 'order' reports earlier, they appreciated the fact that the reports turned up in a more predictable way).

Don't just consider technical solutions to the problem. Often, such problems are based around poor business practices. For example, we had a problem where the payroll report was taking far too long to run and it was getting worse every month. We did everything we could to tune things and get it running fast enough, but we soon reached a point of diminishing returns.

Further investigation found that the report was doing payment analysis for over 40,000 employees. We only have around 5,000. Some more analysis showed that the HR department had adopted procedures that didn't fit well with how the software worked. Rather than terminating a position when someone left, they updated it so that the employees records appeared to represent an active employee who was not getting paid at the moment. they had adopted this procedure because we have quite a high turn-over of casual staff and they wanted to make the re-hiring of casual staff easier. As a result, the payruns were taking longer and longer to execute. The situation became worse after an application upgrade. It turns out, the previous version had a bug that was fixed in the new version. The fixing of the bug resulted in correct processing of employee pay entitlements and as we had what appeared to be 40,000 staff that could be entitled to pay, the system now processed all those records.

this was a difficult one to resolve because it coincided with an applicaiton upgrade and a move to a new version of Oracle running under RAC. As we had moved to a new version of oracle and now had more processing power, the problem was not picked up in testing - reports were initially running within acceptable limits. However, after a few months, the processing time had really blown out. There were also some failures on the part of the HR systems administrator who was neglecting to run monthly 'clean-up' jobs. Again, a process problem rather than a technical problem.

-- 
tcross (at) rapttech dot com dot au
Received on Sat Oct 11 2008 - 02:24:50 CDT

Original text of this message