Clustering
The other side of the distributed computing coin is clusteringspreading processing across multiple CPUswhich is nothing new. In the mid-1980s, Digitals VAX minicomputers could be operated in clusters. In recent years, the focus has moved towards using large numbers (dozens or hundreds) of commodity PCs. At first, stacks of desktop or tower machines with 10-megabit Ethernet connections were used, but rack-mounted systems are now typical as they take up much less space. Faster100-megabit or even gigabitinterconnections are now usual.
Such clusters provide a substantial amount of processing power at comparatively low cost, and are most often found in research environments. Much of the current activity can be traced to NASAs Beowulf project in the mid-1990s, which harnessed commodity hardware and Linux to create a system capable of processing large data sets. A recent, high-end example of this type of system is the one recently installed at Swinburne University of Technology and described in our case study. Also in Melbourne, CSIRO Health Sciences and Nutrition operates a more modest cluster of 64 Athlon CPUs (a mixture of 1.3 and 1.6GHz chips) for drug design. Branson explains that some of the protein receptors being tested are considered proprietary, and they are tested on the cluster rather than on a grid to ensure security.
Cluster or grid
The appropriateness of clustering depends on the nature of the problem being addressed, as it needs to be capable of being broken down into smaller chunks and the results reassembled, says Mayo. Determining in advance which problems are amenable is largely a case of holding a wet finger in the air, he jokes, but anything that is non-symmetric can be a problem. That said, clusters are being used successfully for a variety of applications such as jet engine design and data analysis. For example, a cluster of 1024 Sparc CPUs is used to analyse data collected at the Stanford Linear Accelerator.
Traditional high-performance computing (HPC) vendors went out of business because the market was not big enough to sustain development, says Mayo, so people want to use off-the-shelf components. Whats changing is that whereas universities and other organisations would build their own clusters, they now look to vendors to help build and support the systems. Issues such as availability, security, and upgrades are becoming more important than finding out whether the technology works.
This concept is becoming mainstream. Forrester Research has started talking about Organic IT, which it defines as computing infrastructure built on cheap, redundant components that automatically shares and manages enterprise computing resourcessoftware, processors, storage, and networksacross applications. Clearly, this definition can encompass grids and clusters, though Forrester doesnt predict widespread adoption of Organic IT until 2004.
Cluster-funk
Not everyone is so positive about the benefits of cluster computing. In a reference to the processing power achieved by NECs Earth Simulator supercomputer, Bob Bishop, chairman and CEO of SGI says the world of clustered PCs as a substitute for supercomputing is in tatters right now. The US has spent US$1 billion on clustering, and failed miserably.
Clusters have fallen behind the leading edge by a factor of 100, he adds. Its called B-grade science . . . causing Australians to think they have high-performance computing when they dont, says Bishop, himself an Australian whose first degree was in mathematical physics. He says there are two main problems with using clusters. Firstly, it is hard to administer the software across multiple nodes. Secondly, and more fundamentally, most HPC jobs require a single, large memory space or else communication between the processes becomes a bottleneck.
SGIs multiprocessor systems use an architecture called NUMAflex, which allows each computer to access memory attached to other nodes with very little increase in latency, thanks to a system of crossbar switches and high-speed cabling.
The cluster model works well where the data is uniform across the space, he says, but that is not true of problems in physics and chemistry, he says. A cluster gives you theoretical horsepower, but you cannot achieve good utilisation of those processor cycles: You bought them cheaply, but you cant use them. Under-funded scientists have been pushed into using clusters, but thats trapped them in a dead end, suggests Bishop.
But that doesnt mean he doesnt see a role for some forms of distributed computing: Grid computing is definitely a requirement to provide the horsepower for the [scientific and engineering] community, he says. SGIs view is that HPC is not just about number crunching. Visualisation and collaboration with other researchers around the world are both critical. Its got to be a visual gridyou need an overlay to provide the immersion, says Bishop. That will allow Australian scientists and engineers to overcome the tyranny of distance and participate in the large northern hemisphere markets without leaving the country.
We know how to be globally competitive in sport . . . we need to be globally competitive in science and engineering, says Bishop, adding that Australian industry will lag behind if we do not build the necessary IT capabilities. Some Australian organisations in the oil and gas, defence, and education sectors are moving in the right direction, but we havent rallied around the history of science [and] mobilised our young scientists, Bishop asserts. Australia was a world leader in remote connectivity 50 years ago, he says, but we didnt follow through on that lead.
Sun gets involved in a lot of collaborative stuff, says Mayo, such as work being done by the Defence Science and Technology Organisation on next-generation command and control systems. The project involves giving distributed command headquarters knowledge management tools and the ability to share information. Its not changing the decision making process, its assisting it, he says.
Foster sees grid computing and clusters as complementary. Grid computing technology allows you to coordinate and manage multiple distributed resources, and clustering supports a collection of resources under a single administrative domain. For grid computing to move forward, clustering needs to become more sophisticated, so that you can start to externally manage it to do this dynamic provisioning.
He questions whether managing resources within a cluster and across a grid can be done with the same technologies. When you talk to IBM, they think it is the case, but when you talk to Sun, they think its completely different.
Harnessing the home user
Miller is not sure about the potential for harnessing the unused power of home PCs, so far at least no ones made a business of that. There are security and trust issues, and also interpersonal trust issues trying to convince people.
As for the idea of taking advantage of wasted internal CPU resources, it isnt obvious to me that spare cycles could be put to good use, he says. For example, Microsoft tried using the PCs on ordinary employees desks to stress test software overnight, but found it was not practical as it was too hard to collect the information generated, and there was too much variation in system configurations.
But Foster believes there are some situations where spare CPU cycles can be put to work. US broker Charles Schwab is looking to improve the utilisation of its large systems, which are sized to support double the peak load. Theyve got a lot of empty space over a 24-hour period. They want to start using that space for computation-intensive things like options pricing and portfolio management activities, he explains.
Projects such as Folding@Home and SETI@Home have shown that computationally intensive tasks that are amenable to a divide-and-conquer approach can be distributed among a large number of volunteered machines. This is particularly appropriate to situations where huge data sets can be divided into small chunks that can be processed independently. On the pragmatic rather than technical side, key considerations appear to be that the goal is considered worthwhile by a section of the community, the sponsoring organisation is considered trustworthy, and the underlying project is being run on a non-commercial basis.
But as hardware gets faster, new software functions emerge to apply that power to the benefit of the primary user. When youre not actively using your PC, Miller asks whether you might prefer it to reindex your files so you can find a piece of content more quickly, or work on a problem for someone else.
Rac em up
A different approach to clusteringand one directly applicable to business systemscan be found in Oracles Real Application Clusters (RAC).
New technology such as the Grid initiative and the availability of high-speed interconnects for Intel-standard computers facilitates distributed computing, says Roland Slee, director, business and technology solutions, Oracle Australia. But there are still challenges for general business computing. Business systems almost always have . . . the desire to let a large number of people share access to a single set of information, he says, which means the database layer is often the central component.
While clustering was widely accepted for high-performance computing in the research and scientific communityOracle uses it internally for compiling large pieces of softwarethe approaches used prior to Oracle 9i RAC could not address the database issue, he says. RAC makes it possible to apply clustering to mainstream computing, even mission-critical commercial computing, claims Slee. Deploying a database-centric application in a distributed environment requires two characteristics, he asserts. The first is the ability to operate with no single point of failure. Since such systems are usually based on commodity hardware that is not engineered for high availability, there is a reasonable chance of one node failing. RAC handles such failures transparently, unlike DB2 or SQL Server, he says. A side effect is the ability to add more nodes without shutting down and reorganising the database.
Secondly, users want the ability to deploy a database that is physically and logically unchanged from the single node version. This allows a seamless move to a distributed environment, since no changes need to be made to the database design or to the application to make it cluster-aware. All that is necessary is to install the RAC option and restart Oracle 9i with some parameter changes.
Apart from offering high availability and allowing incremental growth, clustering promises a better price-performance ratio since four two- or four-way systems are cheaper than one eight- or 16-way server. Furthermore, RAC allows the use of blade servers at the database layer, reducing costs even further.
RACs shared disk architecture (using either SAN or NAS) means data is equally accessible from every node since none of the data is owned by a particular node. This means there is no replication within the cluster. Its been widely understood that this is impossible, so its difficult to convince people that it is possible . . . and now easy, says Slee. Using Oracle technology, replication can be used to provide fail over to a backup site.
With clustering, Linux becomes a very attractive system for running mainstream computing, says Slee, especially as Oracle collaborated with Red Hat to create Red Hat Advanced Server for highly available, highly manageable, and highly scalable systems.
According to Slee, as few as two or three percent of Oracle 8i customers used clustering, but he expects that to increase to at least 50 percent with 9i RAC. Fully clusterable, fault-tolerant systems are needed in the database, application server, HTTP server, and cache layers, and thats what Oracle has today and no other vendor has, he says.











