For the past year, Sun Microsystems Inc. has struggled to solve a mysterious fault that can cause its high-end servers to crash unexpectedly, an embarrassing problem for a computer maker that routinely refers to its servers as "rock solid" reliable.
The Palo Alto, Calif., company said the problem, which was eventually traced to a memory flaw, is rare and probably affects fewer than 1% of all computers it has sold.
Since the problem was first identified in its servers, the large computers often used to manage databases and handle e-commerce tasks, Sun (Nasdaq: SUNW) engineers have put together a variety of hardware and software fixes that appear to reduce the risk of spontaneous crashes. Sun has also revamped internal quality programs designed to prevent reliability problems in the first place.
"Sometimes in life, a problem becomes an answer," says John Shoemaker, Sun's executive vice president for system products, who says the problem has pushed Sun to think harder about ways to make its systems more reliable. "We're feeling fairly confident we have this thing covered."
Some critics, however, argue that the problem is more serious that Sun is willing to admit. Paul McGuckin, an analyst with Gartner Group who deals regularly with major corporate customers, said that roughly 60 major Gartner clients have reported problems with as many as several hundred Sun servers.
"There are a lot of unhappy Sun customers out there," says McGuckin, who notes that many Gartner clients complained that Sun took too long to acknowledge the problem's significance and that some believe the computer maker tried to squelch open discussion of the issue.













