Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Sun Boxes Crashing

RE: Sun Boxes Crashing

From: <>
Date: Fri, 8 Sep 2000 09:28:01 +1100
Message-Id: <>

and yet some more feedback.....

Sun Admits to Memory Problem
By Jaikumar Vijayan

FRAMINGHAM, 28 August, 2000

Problems with a memory component that Sun Microsystems Inc. has been quietly trying to fix for the past
several months are continuing to plague some large users of Sun's Ultra Enterprise Unix servers. And Sun
has gone to extraordinary lengths to keep its customers quiet about the issue.

The problem involves an external memory cache on Sun's UltraSPARC II microprocessor module. Under
certain conditions, it has been triggering system failures and frequent server reboots at dozens of customer

Sun Executive Vice President John Shoemaker this week acknowledged that the company has been grappling
with memory-related problems on "a few dozen" of its Ultra Enterprise servers for nearly a year.

Sun customers who have been affected by the problem are unwilling to speak openly about it because Sun
has persuaded many of them to sign nondisclosure agreements, said Tom Henkel, an  analyst at Gartner Group
Inc. in Stamford, Conn.

The nondisclosure agreements were apparently offered with a claim that signing them would bolster Sun's
commitment to resolving the problem quickly, Henkel said. Sun customers began reporting the problem as
long as 18 months ago, he said.

Shoemaker this week acknowledged that it may have been a bad idea for Sun to get  its users to sign
nondisclosure agreements. But he said the company took that measure only because  Sun itself was
struggling to pinpoint a reason for the system failures. He added that Sun has stopped requiring such agreements.

The long-standing nature of the problem and Sun's handling of the issue raise troubling questions about the
quality of Sun's hardware and support, Henkel said.

One high-profile customer that has had very public problems with Sun hardware is  eBay Inc. The online auctioneer
has suffered a series of hardware-related outages over the past year, including one this week. It is unclear whether
eBay's problems are related to the memory issue, however.

Gartner plans soon to release an advisory on the memory component issue, updating one released in November,
because of continued and "frequent client complaints of persistent downtime" caused by the problem.

Sun insisted this week that the problem hasn't caused any data loss for customers. But the frequency of reboots
disrupts availability and can cause data loss if applications don't restart properly, users said.

In the past year, Henkel said, he has talked with at least 50 Sun customers who complained of hardware reliability
issues caused by defective memory. Systems affected by the problem appear to be those based on 400-MHz
UltraSPARC-II CPU modules using either a 4MB or 8MB cache.

"There are a lot of very unhappy campers out there," Henkel said. "Sun has been experimenting for too long now
to find a solution to this problem."

Meta Group Inc. in Stamford, Conn., also has clients that have experienced the problem.

"There was a rash of reliability issues relating to this problem in the March-to-April time frame," though none since then, said Meta Group analyst Brian Richardson. Eight out of 20 of Meta's large Sun accounts reported the problem, Richardson said.

According to Shoemaker, the issue has triggered a massive overhaul of Sun's quality processes and has already
directly resulted in about eight major hardware and software changes being incorporated into Sun's Ultra Enterprise server line.

Sun has also put in place far more rigorous quality and availability testing of its products and is mandating more
stringent audits of customer sites, environmental conditions and planned configurations before taking orders on its high-end servers, Shoemaker said.

By year's end, Sun will release a mirrored memory module that should address this issue once and for all, Shoemaker
added. In the past several months, Sun has also been in direct contact with the CIOs at several of the affected companies to explain Sun's new quality initiative, he said.

"This has been a watershed event for Sun," Shoemaker said, adding that the company has moved from the back of the
class to class leader with respect to quality.

But according to an MIS manager in North Carolina who has experienced the memory  problem and who spoke on
condition of anonymity, Sun has offered no explanation for the problems. "Sun has not disclosed any information to
me about their memory issues - not even a brief description," the manager said.

In the past three months, all of the manager's six Sun servers have crashed because of memory-related problems,
he said. In each instance, Sun swapped out entire CPU modules but offered no explanation for doing so, he said.

A user at a Midwestern manufacturing company, who also spoke on condition of anonymity, had a similar experience.

"As soon as we reported the issue to Sun, the affected processors were replaced under service contract," he said.
The company was able to resolve the problem by rearranging "our data center with  the express purpose of lowering
system temperatures," he said. "The systems run 10 to 15 degrees Fahrenheit cooler than before, and we haven't seen
a problem since."

According to Shoemaker, Sun hasn't been able to narrow the problem to any one specific cause. Sun believes the
problems may have been caused by a combination of factors, including defective components from one of Sun's
suppliers, poor packaging of the memory chips on the system boards and environmental factors.

Meghan Holohan contributed to this report.

"Wasserman, Sara" <> on 08/09/2000 06:50:37

Please respond to

To: Multiple recipients of list ORACLE-L <> cc: (bcc: GRANT G HOLYOAKE/NSO/CSDA)
Subject: RE: Sun Boxes Crashing

Sun's memory cache problem:

> -----Original Message-----
> From: Rama Malladi []
> Sent: Wednesday, September 06, 2000 2:41 PM
> To: Multiple recipients of list ORACLE-L
> Subject: Sun Boxes Crashing
> We have several Sun boxes (Solaris 2.6) running Oracle 8, 8i. One of the
> boxes (description given below) Kept rebooting and this machine happens to
> run one of the most critical billing systems (Murphy's law!).
> Overall, this machine rebooted some 40 times, in a period of 2 months and
> some nights, it rebooted as many as 10 times! Our SysAdmin contacted Sun
> Engineers and they never told us what exactly was the problem, and kept
> replacing CPUs, Memory boards, SCSI cards etc ... This happened several
> times and last week there was an article in Computer Weekly magazine
> saying
> several customers were having this kind of problem on Sun boxes and Sun
> tried to hush up the matter ...!!
> Has anybody else faced this kind of situation?
> Just curious ...
> Rama
> =================================
> System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise
> E4500/E5500
> SunOS uscaelmux06 5.6 Generic_105181-21 sun4u sparc SUNW,Ultra-Enterprise
> --
> Author: Rama Malladi
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).

Author: Wasserman, Sara

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
To REMOVE yourself from this mailing list, send an E-Mail message
to: (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
Received on Thu Sep 07 2000 - 17:28:01 CDT

Original text of this message