"When things go wrong they point the finger at the database or OS or hardware" <-- yes, and when this happens it's just a matter of getting the facts, numbers, figures and we can point the finger back to them. Take a look at the screenshots below, they are all different scenarios where things gone bad

PGA reaching 30GB when developers fire up new reports that's doing tremendous hash joins eating up the server memory causing the kswapd to kick in and swapping at a high rate which translates to CPU wait IO and high load average.. basically killing the server

New batch of reports were introduced, and we found out that the developers are testing stuff in the PROD environment

Load average spike

Sudden 15GB/s read caused by just two SQLs

And since the data points are based in AWR you can drill down on snap_ids, generate ASH at that time period, pick the SQLs.. and regroup with the developers ;)

